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We present a measurement of the CP-violating parameter fil^^ using approximately 6500 
B° — > J/4>4> decays reconstructed with the CDF II detector in a sample of pp collisions at 
yfs — 1.96 TeV corresponding to 5.2 fb _1 integrated luminosity produced by the Tevatron Collider at 
Fermilab. We find the CP-violating phase to be within the range /3 S J/V " / ' G [0.02, 0.52] U [1.08, 1.55] 
at 68% confidence level where the coverage property of the quoted interval is guaranteed using 
a frequentist statistical analysis. This result is in agreement with the standard model expec- 
tation at the level of about one Gaussian standard deviation. We consider the inclusion of a 
potential S-wave contribution to the — > J/ipK + K~ final state which is found to be neg- 
ligible over the mass interval 1.009 < m(K + K~) < 1.028 GeV/c 2 . Assuming the standard 
model prediction for the CP-violating phase fil^^ , we find the B° decay width difference to be 
Ar a = 0.075 ± 0.035 (stat) ± 0.006 (syst) ps _1 . We also present the most precise measurements 
of the B® mean lifetime t(B°) — 1.529 ± 0.025 (stat) ± 0.012 (syst) ps, the polarization fractions 
|A (0)| 2 = 0.524±0.013 (stat) ±0.015 (syst) and |A||(0)j 2 = 0.231±0.014 (stat) ±0.015 (syst), as well 
as the strong phase S± = 2.95 ± 0.64 (stat) ± 0.07 (syst) rad. In addition, we report an alternative 
Bayesian analysis that gives results consistent with the frequentist approach. 
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Since the discovery of the simultaneous violation of 
charge and parity quantum numbers (CP violation) in 
1964 in the neutral kaon system [l|, CP violation has 
played a crucial role in the development of the stan- 
dard model (SM) of particle physics and in searches 
for "new" physics (NP) beyond the SM. In 1973, be- 
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FIG. 1. Lowest order Feynman diagrams that induce B®-Bg 
oscillations. 



fore the discovery of the fourth (charm) quark [2j and 
the third generation (bottom [3| and top [4j quarks), 
Kobayashi and Maskawa proposed an extension to a six- 
quark model [5| in which CP violation was explained 
through the quark mixing parametrized by the Cabibbo- 
Kobayashi-Maskawa (CKM) matrix. A single, irre- 
ducible complex phase in the CKM matrix is responsible 
for all CP-violating effects in the standard model. 

One of the most promising processes for the search for 
physics beyond the standard model is through oscillations 
of B® and B® mesons. The time evolution of the and 
B° s mesons can be described by the Schrodinger equation 



dt 



\B° s (t)] 



m s - - r s 



\b°M. 
\b°M, 



(1) 



with M, = 



M(* M, 



? and T s = 



r 
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where M s and f s are the mass and decay rate 2x2 
matrices. The B® mixing diagrams shown in Fig. [1] give 
rise to non-zero off-diagonal elements M{ 2 and Tf 2 - The 
diagonalization of M s — i/2T s leads to the heavy and 
light mass eigenstates Bf and B% which are admixtures 
of the flavor eigenstates B® and B®: 



and 



\B?\ 

\Bi\ 



p\b° 3 : 
p\b° s : 



q\B° s ) 
q\B° s ) 



(2) 



where p and q are complex quantities which are related to 
the respective CKM matrix elements (see Fig. [T]) through 
q/p = (V t * b Vt s )/(V tb V* s ) within the SM, and satisfy \p\ 2 + 
\q\* = l. 

The off-diagonal elements of the mass and decay ma- 
trices can be related to the mass difference between the 
B^ and Bf mass eigenstates @ 



Am, 



= 2|M 



12 



1 + r 
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and the corresponding decay width difference 

Ar, = rf - r? 



2ir? 
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i |r? 2 
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where <j) s = arg(— Mf 2 /Tf 2 ). Typically in the B® system 
corrections of order (rf 2 /Mf 2 ) 2 can be neglected. The 
total B° s decay width T s = (rf + rf )/2 = H/t(B°) is 
related to the mean B® lifetime t(B®), while the mass 
difference Am s is proportional to the frequency of B®- 
B® oscillations, first observed by CDF Q, where the cur- 
rent world average is Am s = 17.77±0.10±0.07ps _1 H. 
Assuming no CP violation in the P° system, which is 
justified in the SM where the CP-violating phase is ex- 
pected to be small (0f M 0.004 [9]), the P° mass eigen- 
states are also CP eigenstates where is the width of 
the CP-even state corresponding to the short lived state 
in analogy to the kaon system where the short-lived state 
(K%) is CP even. T H is the width of the CP-odd state 
corresponding to the long lived B° s state. 

A broad class of theoretically well-founded exten- 
sions of the SM predicts new sources of CP-violating 
phases [Tol - [T2l |. In the presence of physics beyond the 
standard model, the quantities describing the P? system 
can be modified by a phase 4>™ p as follows 0, [T2| : 



1 12 — 1 1 



SM 



M 



s SM 
12 



x A,, where A, 



(5) 



In this parameterization it is assumed that new physics 
has a negligible effect on rf 2 , which is the case for a large 
class of new physics models and confirmed by experimen- 
tal data @, and only M-f 2 is changed by the factor A s . 
As the precise determination of the B° oscillation fre- 
quency Q is well within the standard model expectation, 
contributions of new physics to the magnitude |A,J on 
the level of greater than about 10-20% are unlikely [12j . 
A currently preferred place to search for new physics is 
through the phase 4> s which is unconstrained by measure- 
ments of the B®-B° s oscillation frequency. Since <p™ is 
small, in a new-physics scenario with a large contribution 
to 4>s > the approximation <p s = <pf M + <fr^ p « <fr^ p can be 
made. 

An excellent probe of this new-physics phase [l3| is 
through the decay mode B° s -> J/ip(f>(1020), with J/tp -> 
H + lT and 0(1020) -> K + K~. Note that throughout 
this paper we refer to 0(1020) just as <fi for brevity. Fig- 
ure [H shows the leading P° — ► J/ip<t> decay diagram 
on the left-hand side while the decay topology is indi- 
cated on the right. The relative phase between the decay 



amplitudes with and without mixing is 2/3, 



which 



is responsible for CP violation in B° s — > J/ip(f> decays. 
Neglecting higher order loop corrections (penguin con- 
tributions) and assuming that there is no CP violation 



present in the decay amplitude 



2/3. 



can be associ- 
(q/p) ■ (Af/Af), where Af and 
Af are the decay amplitudes in B° s — > J/ip(j) and P° — > 



ated with e j2/3 
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FIG. 2. Leading B° -> decay diagram (top) and decay 

topology (bottom). 



J/ixfi, respectively. In the standard model this phase is 
Pf M = mg[-{V ts V&)/{V cs V* b )], where V t] are again the 
corresponding elements of the CKM quark mixing ma- 
trix. Global fits of experimental data tightly constrain 
the CP- violating phase to small values in the context of 
the standard model, /3f M 0.02 @,[T3. The presence of 
new physics could modify this phase by the same quantity 
4>^ p that affects the phase 



pressed as 2/3, 



allowing fil^^ to be ex- 



2/3. 



SM 



b^ p HG3. Assuming that 
new physics effects are much larger than the SM phase, 
we can again approximate 2/3^^ sa —<t>^ P ~ — <p s - 
The first measurements of the CP-violating phase 



by the CDF and DO experiments [ill [l6| each 
showed a mild inconsistency with the SM prediction 
where interestingly both results deviated in the same di- 
rection. A preliminary combination of the CDF and DO 
analyses with samples corresponding to 2.8 fb _1 inte- 
grated luminosity was inconsistent with the SM expecta- 
tion at the level of about two standard deviations (l7j . 
In addition, recent dimuon asymmetry results from the 
DO collaboration [ijj] suggest additional indication for ef- 
fects of physics beyond the standard model in B® mixing. 
During the preparation of this manuscript the DO col- 
laboration released an updated measurement of the CP- 
violating phase j3s^^ using a data sample based on 
8 fb _1 of integrated luminosity [l9j], while the LHCb col- 
laboration presented a first preliminary measurement of 
the B® mixing phase showing confidence regions in agree- 
ment with the SM prediction within one standard devia- 
tion m. 

This paper presents a measurement of the CP- 



tegrated luminosity of our previously published analy- 
sis [HI , as well as additional improvements in flavor tag- 
ging and the inclusion of potential S'-wave contributions 
to the B® — > J/ip4> signal. This article is organized as 
follows. In Sec. [TT] we give an overview of the work flow 
of the analysis, while we describe the CDF experiment 
in Sec. IIIII The data selection is summarized in Sec. IIVI 
The applied flavor tagging is discussed in Sec. Eland the 
likelihood fit function is detailed in Sec. IVI1 The mea- 
surements of the B® mean lifetime, Ar s , the polarization 
fraction and the respective systematic uncertainties are 
described in Sec. IVIII and Sec. IVH Al respectively. The 



results on /3, 



J/i><t> 



and Ar s using a frequentist analysis 



are summarized in Sec. IVIIII while Sec. IIXI describes an 
alternative Bayesian approach. A summary is given in 
Sec. El 



II. MEASUREMENT OVERVIEW 

The measurement of the phase fi J J^"^ relies on an anal- 
ysis of the time-evolution and kinematics of the B® — > 
J I ipcj) decay, which features a pseudoscalar meson decay- 
ing to two vector mesons. Consequently, the total spin in 
the final J/ifxfi state is either 0, I or 2. To conserve the 
total angular momentum, the orbital angular momentum 
L between the final state decay products must be either 
0, I or 2. While the J/ip and <p are CP-even eigenstates, 
the J/ip</> final state has a CP eigenvalue given as (— I ) L . 
Consequently, the states with orbital angular momen- 
tum and 2 are CP-even while the state with angular 
momentum I is CP-odd. We use both the decay time of 
the B® and the decay angles of the J/ip — > n + ijT and 
<p — > K + K~ mesons to statistically separate the CP-odd 
and CP-even components of the J/ip<p final state. 

There are three angles that completely define the di- 
rections of the four particles in the final state. We use 
the angular variables p = {cos 8t, 4>t, cos ipr} as de- 
fined in the transversity basis [2l[ . In the following rela- 
tions we use a notation where p(A)b denotes the three- 
momentum of particle A in the rest frame of particle B. 
With this notation, the helicity angle ipx of the K + is 
defined in the <j> rest frame as the angle between p(K + ) 
and the negative J/ip direction: 



cos ipq 



p{K+)^-p{J/^) (t> 

]p{K+) 4 \ ■ w(j/m 



(6) 



To calculate the other two angles, we first define a coor- 
dinate system through the directions 



p(<f>) 



p(K + )j /4 , - [p{K + ) J/M , ■ x] x 

■&\x\ 



x x y. 



[p{K^ 



(7) 



violating phase 



using about four times the in- 



With this coordinate system the following angles of the 
direction of the /i + in the J/ip rest frame are calculated 
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FIG. 3. Illustration of definition of transversity angles 9t, 4>t, 
and ipT • 



MS 



COS Ot 



tan -1 



|p(M+)j/^| 



V \i>(p + )j/i>\ 



(8) 
(9) 



where the ambiguity of the angle <pT is resolved using 
signs of p(p + )j/^ • x and p • y. The definitions 

of the transversity angles are illustrated in Fig. [3J 

The decay is further described in terms of the polar- 
ization states of the vector mesons, either longitudinal 
(0), or transverse to their directions of motion, and in 
the latter case, parallel (||) or perpendicular (_L) to each 
other. The corresponding amplitudes, which depend on 
time t, are called Aq, An and A±, respectively. The trans- 
verse linear polarization amplitudes Ay and A± corre- 
spond to CP-even and CP-odd final states at decay time 
t = 0, respectively. The longitudinal polarization am- 
plitude Aq corresponds to a CP-even final state. The 
three states in the transversity basis are easily expressed 
as linear combinations of states in either the helicity ba- 
sis (++, 00, ) or the orbital angular momentum basis 

(S, P, D). In the helicity basis, A\\ and A± arc linear 

combinations of the states with helicities ++ and , 

while the state corresponding to Aq is the same in both 
transversity and helicity bases. In terms of the S, P and 
D-waves, the states described by Ao and A\\ are linear 
combinations of S and D waves, while A± corresponds 
to the P-wave state. Since only differences between the 
strong phases of these amplitudes are observable, we de- 
fine the strong phases relative to ^4o(0) at time t = 0: 
S = 0, J|j - argL4||(0)A*(0)] and 5± = arg[A ± (0)^(0)]. 
We note that the strong phases <5y and 5± are either or 
7r in the absence of final state J/ip<j> interactions. Devi- 
ations of these phases from or ir indicate breaking of 
the factorization hypothesis which assumes no interac- 
tion between the J/ ip and cf> in the final state [9j, [l3| . 

If the decay width difference between the P° mass 
eigenstates Ar s is different from zero, a time-dependent 
angular analysis without flavor tagging is sensitive to 



CP-even components (22[. The sensitivity to /3s can 
be improved by separating mesons produced as from 
those produced as B° s in order to detect CP asymmetries 
in the fast B®-B° s flavor oscillations given sufficient de- 
cay time resolution. The process of separating B® mesons 
from P" mesons at production is called flavor tagging. 

The angular-dependence and flavor tagged (see Sec-IV]) 
time-dependence are combined in an unbinned maxi- 
mum likelihood fit. The fit is used to extract /3s^^, 
the B° s decay width difference Ar s , the average B a s life- 
time, the transversity amplitudes and the strong phases. 
Since a contamination from K + K~ final states that 
do not originate from a (j) decay can contribute to the 
K + K~ mass window used to identify <j) candidates in this 
analysis, we consider potential contributions from other 
B° s — > J/ijjK + K~ decays in our B° s — > J/tpcj) candidate 
sample. In such decays the relative angular momentum of 
the two kaons is assumed to be zero (S- wave) as expected, 
for example, from /o(980) — > K + K~ decays. Continuum 
B® —> J/^pK + K~ decays with angular momentum higher 
than zero are expected to be suppressed. In all such cases 
the K + K~ system is assumed to be in a partial iS-wave 
whose angular momentum combined with that of the J/ ip 



because of the interference between CP-odd and 



leads to a CP-odd final state [23( . The S- wave contribu- 
tion is included in the time-dependent angular analysis 
and the S'-wave fraction together with its corresponding 
phase Ssw are determined as parameters in the maxi- 
mum likelihood fit. The inclusion of the S'-wave in the 
likelihood function constitutes a significant improvement 
with respect to earlier measurements (l5l HH ] . 

Due to the non-Gaussian behavior of the likelihood 
function with respect to the parameters /?/ and 
Ar s [IE], [l(| , we use a frequentist analysis to obtain con- 
fidence regions for both parameters. We also determine 
point estimates for other parameters of interest, like the 
polarization fractions and the B® lifetime. In addition, 
we perform an alternative Bayesian approach, through 
the use of priors, applied to probability densities deter- 
mined with Markov chain Monte Carlo. 



III. CDF II DETECTOR AND TRIGGER 

The CDF II detector employs a cylindrical geometry 
around the pp interaction region with the proton direc- 
tion defining the positive z-direction. Most of the quan- 
tities used for candidate selection are measured in the 
plane transverse to the z-axis. In the CDF coordinate 
system, ip is the azimuthal angle, 9 is the polar angle 
measured from the proton direction, and r is the radius 
perpendicular to the beam axis. The pseudorapidity r\ 
is defined as r\ = — ln[tan(0/2) ]. The transverse mo- 
mentum, pt, is the component of the particle momen- 
tum, p, transverse to the z-axis (px = P ■ sin#), while 
Et = E ■ sin 8, with E being the energy measured in the 
calorimeter. 

The CDF II detector features excellent lepton identi- 
fication and charged particle tracking and is described 
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in detail elsewhere [U [25|. The parts of the detector 
relevant to the reconstruction of the — » J/tp(f> de- 
cays used in this measurement are briefly summarized 
below. The detector nearest to the p p in teraction region 
is a silicon vertex detector (SVXII) [26s], which consists 
of five concentric layers of double-sided sensors located 
at radii between 2.5 and 10.6 cm plus one additional 
single-sided layer of silicon (LOO) [27J mounted directly 
onto the beam pipe at radius r~1.5 cm. In addition, 
two forward layers plus one central layer of double-sided 
silicon located outside the SVXII at radii of 20-29 cm 
make up the intermediate silicon layers (ISL) [28j . To- 
gether with the SVXII, the ISL detector extends the sen- 
sitive region of the CDF II tracking detector to \rj\ < 2.0. 
The CDF silicon system has a typical hit resolution of 
~ 11 /im and provides three-dimensional track recon- 
struction. It is used to identify displaced vertices associ- 
ated with bottom hadron decays which are reconstructed 
with a typical transverse resolution of 10-20 fmx. The 
central outer tracker (COT) |29j, an open-cell drift cham- 
ber with 30 240 sense wires arranged in 96 layers com- 
bined into four axial and four stereo super-layers (SL), 
provides tracking from a radius of ~ 40 cm out to a ra- 
dius of 132 cm covering \z\ < 155 cm, as well as the 
main measurement of track momentum with a resolu- 
tion of <j(pt)/Pt ~ 0.15% [GeV/c] -1 for high momentum 
tracks. The COT also provides specific ionization energy 
loss, dE/dx, information for charged particle identifica- 
tion with approximately 1.5 a separation between pions 
and kaons with momenta greater than 2 GeV/c 30]. The 
central tracking system is immersed in a superconduct- 
ing solenoid that provides a 1.4 T axial magnetic field. 
Right outside the solenoid, the time-of-flight (TOF) de- 
tector provides additional particle identification for low- 
momentum particles. 

Central electromagnetic (CEM) [3l| and hadronic 
(CHA) [32j calorimeters < 1.1) are located out- 
side the COT and the solenoid, where they are ar- 
ranged in a projective-tower geometry. The electro- 
magnetic and hadronic calorimeters are lead-scintillator 
and iron-scintillator sampling devices, respectively. The 
energy resolution for the CDF central calorimeter 
is a(E T )/E T = [(13.5%/V^) 2 + (1.5%) 2 ] 1/2 for 
electromagnetic showers [U, H3| and ct(Et)/Et = 
[(75%/V^) 2 + (3%) 2 ] 1 / 2 for hadrons [Ull, where E T 
is measured in GeV. A layer of proportional chambers 
(CES), with wire and strip readout, is located six ra- 
diation lengths deep in the CEM calorimeters, near the 
electromagnetic shower maximum. The CES provides a 
measurement of electromagnetic shower profiles in both 
the <p and z directions for use in electron identification. 

Muon candidates are identified by multi-layer drift 
chambers and scintillator counters [3J|. Four layers of 
planar drift chambers (CMU) are located outside the cen- 
tral calorimeter at a radius of 347 cm from the beam line. 
The CMU system covers \rf \ < 0.6 and can be reached by 
muons with px in excess of ~ 1.4 GeV/c. To reduce 
the probability of misidentifying penetrating hadrons as 



muon candidates in the central detector region, four addi- 
tional layers of drift chambers (CMP) are located behind 
0.6 m of steel outside the CMU system. Approximately 
84% of the solid angle for \rj\ < 0.6 is covered by the CMU 
detector, 63% by the CMP, and 53% by both. To reach 
these two detectors, particles produced at the primary in- 
teraction vertex, with a polar angle of 90°, must traverse 
material totaling 5.5 and 8.8 pion interaction lengths, re- 
spectively. Muons with hits in both the CMU and CMP 
detectors are called CMUP muons. An additional set of 
muon chambers (CMX) is located in the pseudorapidity 
interval 0.6 < \t]\ < 1.0 to extend the polar acceptance of 
the muon system to the forward region. Approximately 
71% of the solid angle for 0.6 < < 1.0 is covered by the 
free-standing conical arches of the CMX. The calorime- 
ter, magnet yoke of the detector, and the steel support 
structure provide shielding of about 6.2 pion interaction 
lengths. 

The data used in this measurement are collected 
with dimuon triggers (24|. Muons are reconstructed as 
track stubs in the CMU, CMP and CMX chambers. 
Muon stubs are matched to tracks reconstructed using 
COT axial information from the extremely fast trigger 
(XFT) [35)]. The dimuon trigger requires at least one 
central muon matching the CMU or CMUP chambers, 
while the second muon can be either central or forward, 
matching to the CMU or CMX detectors, respectively. 
The CMU, CMUP, or CMX muons must satisfy p T > 
1.5 GeV/c, p T > 3.0 GeV/c and p T > 2.0 GeV/c, respec- 
tively. The two trigger muon candidates are required to 
be oppositely charged, have an opening angle inconsis- 
tent with a cosmic ray event, and the invariant mass of 
the muon pair must satisfy 2.7 < m(/U + /i~) < 4 GeV/c 2 . 



IV. DATA RECONSTRUCTION AND 
SELECTION 

We use a data sample corresponding to an integrated 
luminosity of 5.2 fb _1 collected with all CDF II detec- 
tor subsystems functioning. In addition, all analyzed 
data passed the dimuon trigger requirements given above. 
We begin our offline reconstruction of the B® —> J/ip(—> 
(i + > K + K~) decay mode by requiring two muon 

candidate tracks that extrapolate to a track segment in 
the muon detectors reapplying the appropriate transverse 
momentum requirements for the respective trigger muons 
using offline-reconstructed quantities. To reconstruct the 
J/ip candidate, a kinematic fit constraining the two op- 
positely charged muon candidate tracks to a common in- 
teraction point (vertex) is applied. All other charged 
particles in the event are assumed to be kaon candidates 
and combined as opposite-charge pairs to reconstruct <f) 
meson candidates. Finally, all four candidate tracks are 
combined in a kinematic fit that constrains the muon 
candidates to the J/tp world average mass Q and re- 
quires the four tracks to originate from a common three- 
dimensional vertex point. 
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For each event a primary interaction point is con- 
structed from all reconstructed tracks in an event, ex- 
cluding the J/ip and <p candidate tracks. This interac- 
tion point is used in the calculation of the B® proper de- 
cay time, ct = m(B s )L xy (B s )/p T (B s ), where L xy (B° s ) 
is the distance from the primary vertex point to the 
Bg — » J/4"P decay vertex projected onto the momentum 
of the B° s in the plane transverse to the proton beam di- 
rection, m(B®) is the nominal mass of the B® meson Q, 
and pt(B®) is its measured transverse momentum. 



A. Basic selection criteria 

For the final selection of B® — > J/ip<f> candidates, we 
use the Neurobayes [36] artificial neural network (ANN) 
to distinguish signal events from background. Prior to 
the training of the ANN, we apply basic selection re- 
quirements in order to ensure track and vertex qual- 
ity, as well as the reconstructed particle candidates to 
have kinematic properties appropriate for B®, J /if), and 
<fi mesons. These basic selection criteria are listed in 
Table U summarizing the standard quality requirements 
that were imposed on track candidates. We require kaon 
candidates to have transverse momentum greater than 
400 MeV/c and all kinematic fits are required to have 
Xrtf, < 50, where i s the \ 2 01 the two-dimensional 
r<\>- vertex fit for four degrees of freedom (dof). To se- 
lect B® — » J/ip4> candidates, we require that the in- 
variant mass of the muon pair lies within the mass re- 
gion 3.04 < m(fi + ij,~) < 3.14 GeV/c 2 corresponding 
to an interval around the world average J/ip mass [8j 
of about ±2.5cr where a is the J/ip mass resolution. 
The invariant mass of the kaon pair is required to be 
within 1.009 < m{K + K~) < 1.028 GeV/c 2 correspond- 
ing to a ±2.5cr interval around the nominal </> mass [8j 
where a corresponds to the <j> mass resolution. The mass 
of the reconstructed J/ip<fi candidate has to be in the 
mass window 5.1 < m(J/4>K + K~) < 5.6 GeV/c 2 corre- 
sponding to a ±250 MeV/c 2 interval around the nominal 
£?5? mass [8j . Additionally we require that the transverse 
momentum of the <f> candidate is greater than 1.0 GeV/c 
and the B® candidate has a transverse momentum of 
more than 4.0 GeV/c. 



B. Monte Carlo simulation 

We use simulated B° s J/ipcj) Monte Carlo (MC) 
event samples to describe the signal in the training of 
the artificial neural network. These MC samples are also 
employed in the determination of the transversity an- 
gle efficiencies due to the non-hermeticity of the CDF II 
detector (see Sec. IVI[) . We simulate the generation and 
fragmentation of b quarks using the BGENERATOR pro- 
gram [sij] . It is based on next-to- leading-order QCD cal- 
culations and the Peterson fragmentation function [38| 
tuned to the 6-quark momentum spectrum measured at 



TABLE I. Basic selection requirements applied to the B° — > 
J/ip(j> four-track system as used in training the artificial neural 
network. 



Quantity 



Selection requirement 



COT hits 

r-(f> silicon hits 
Kaon track pr 
Vertex X % (4 dof) 
J/ip mass region 
4> mass region 
B° mass region 
p T (<p) 
Pt(B° s ) 



> 2 stereo and > 2 axial super-layers 
with > 5 hits each 

> 3 

> 0.4 GeV/c 
< 50 

3.04 < m(fj, + p,~) < 3.14 GeV/c 2 
1.009 < m(K + K~) < 1.028 GeV/c 2 
5.1 < m(.J/iPK + K') < 5.6 GeV/c 2 

> 1.0 GeV/c 

> 4.0 GeV/c 



CDF [24| ■ The decay of the B? meson is simulated with 
the evtgen decay package [39J. The interaction of the 
generated particles with the CDF II detector is simulated 
with the full GEANT [|(| based CDF II detector simula- 
tion package [Ilj]. We subject the simulated events to the 
same trigger requirements and reconstruction process as 
our data events. The B° decays are simulated according 
to the phase space available to the decay averaging over 
the spin states of the decay daughters. This procedure 
ensures that all transversity angles are generated flat for 
B® decays. 



C. Selection using artificial neural network 

The information from several kinematic variables is 
combined into a single discriminant by the artificial neu- 
ral network. Based on the discriminant, an event is 
classified as background-like or signal-like on a scale of 
— 1 to +1. Correlations between variables are taken 
into account by the ANN, and the weight of each vari- 
able in the overall discriminant depends on its corre- 
lation with other variables. Our artificial neural net- 
work is trained on a signal sample based on 350 000 
B a s — > J/ip(f> Monte Carlo events. The background sam- 
ple used in training the ANN consists of ~ 300 000 data 
events taken from the B® invariant mass sideband regions 
(5.2, 5.3) U (5.45, 5.55) GeV/c 2 . 

We use the following variables as input to the ANN 
listed in order of discriminating power and relevance to 
the final discriminant: the transverse momentum pt of 
the 4> meson, the kaon likelihood [22| based on TOF and 
dE/dx information, the muon likelihood [42| for the J/tp 
muon daughters, Xr<j> f° r the B® decay vertex reconstruc- 
tion, the transverse momentum px of the B® meson, and 
the probabilities to reconstruct vertices of the B° s: </>, and 
J/?p candidates. These vertex probabilities are x 2 prob- 
abilities for the three-dimensional vertex fit, while Xrc/> 
is the goodness of fit for the two-dimensional vertex fit. 
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The muon and kaon likelihoods are quantities used for 
particle identification. The algorithm determining the 
muon likelihood is described in Ref. The kaon like- 
lihood 22] is a combined discriminant constructed from 
the kaon track specific energy loss, AE/dx, and its time- 
of-flight information. Both likelihood variables have been 
calibrated on large control samples. 



D. Optimization of selection 

Once the artificial neural network is trained, we choose 
a discriminant cut value that provides the best expected 
average resolution (sensitivity) to the CP- violating phase 
pi^^ . This optimization differs from our previous analy- 
sis [HI where the significance of B® signal yield was max- 
imized using S/y/S + B as figure of merit, in which the 
signal S was obtained from Monte Carlo simulation of B® 
events and the background B taken from the B® sideband 
regions. In the current optimization we study the sensi- 
tivity to as a function of ANN discriminant Cnn 
using pseudoexperiments that are generated to mimic our 
data with a specific signal-to-background ratio for a given 
cut value on Cnn- Using the likelihood fit function de- 
scribed in Sec. lVIl we repeat the entire analysis procedure 
for each pseudoexperiment and evaluate the distributions 
of the estimated variance on /3s to find the ANN dis- 
criminant cut value that gives the best expected average 
resolution for fi J J^"^ [H[. 

In detail, the pseudoexperiments are created by ran- 
domly sampling the probability density functions (PDF) 
for variables used in the fit to describe the data as out- 
lined in Sec. IVI1 We simulate the effect of varying the cut 
on the ANN output variable by generating pseudoexper- 
iments at different values of S/Ntot, where Ntot is the 
total number of events in the J/ip4> invariant mass win- 
dow, and S is the number of B® signal events. Ntot and 
S arc determined from mass fits to the data for different 
cut values of Cnn- The input values of all other param- 
eters in the PDF are kept the same for all pseudoexper- 
iments corresponding to the same ANN cut; only Ntot 
and S/Ntot are varied. We take the parameter input 
values from the results of the unbinned maximum likeli- 
hood fit of B° — > J I \jj<j) decays using data corresponding 
to an integrated luminosity of 2.8 fb _1 [l7|. The only ex- 
ceptions are the parameters describing the tagging power, 
which correspond to the total tagging effectiveness of 
both the opposite and same side tagging algorithms (see 
Sec. [Vj) valid for the full data set of 5.2 fb _1 integrated 
luminosity. To verify in our optimization that the ex- 
pected average resolution to /3s ^ is independent of the 
true values of our parameters of interest, f} J J^ and Ar s , 
we generate pseudoexperiments at a few points in (0s \ 
AT S ) parameter space: (f3s /i " t> = 0.5, Ar s = 0.12 ps" 1 ), 
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FIG. 4. Magnitude of expected uncertainty on fi J a ^^ as a 
function of cut value on the artificial neural network dis- 
criminant for pseudoexperiments generated with input values 



0.5 and Ar s = 0.12 ps" 



0.02, Ar s = 0.1 ps" 1 ), and (# 



0.3, 



Ar s = 0.09 ps -1 ). We also verify that the widths of 



the uncertainty distributions do not vary as a function of 
artificial neural network discriminant [431 ] . 

The most probable values of the /3s^^ uncertainty for 
pseudoexperiments generated at (fia ^ = 0.5, AT S = 
0.12 ps -1 ) are shown as a function of the cut value on the 
ANN output variable in Fig. |4j It is apparent that tight 
cuts on Cnn correspond to larger ^ ^ expected un- 
certainties. It should be emphasized that this technique 
is not intended to guarantee a particular uncertainty on 
pj/il><f> ^ j s mere iy U sed to identify the trend in fi J J^^ 
uncertainty size as a function of cut value on the ANN 
output variable. 

The trend in size of the /3g ^ uncertainty distribu- 
tions is similar for the other sets of , Ar s ) at 
which pseudoexperiments are generated. The expected 
statistical uncertainty on f^J^ shows a shallow mini- 
mum around Cnn ~ 0. We adopt a cut on the ANN 
output discriminant at > 0.2, where the uncertainty on 
/3s is small, allowing us to avoid adding unnecessary 
amounts of background which we would include by going 
to a lower cut value on Cnn- For comparison, an opti- 
mization using S/y/S + B as figure of merit yields a cut 
value of the ANN discriminant of about 0.9 resulting in 
an almost 20% larger expected uncertainty on ft J s ^^ '. 

A cut on the ANN output discriminant of 0.2 yields 
6504 ± 85 B° — > J/ip(j) signal events, as extracted by 
a fit to the invariant mass with a single Gaussian with 
flat background as shown in Fig. [S] on the left-hand side. 
The signal and sideband regions used to describe sig- 
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nal and background events in the PDF (see Sec. IVI|) 
are indicated by the areas with vertical and horizontal 
lines, respectively. They explicitly correspond to 5.340 < 
m(J/ipK + K~) < 5.393 GeV/c 2 for the signal region and 
(5.287,5.314) U (5.419,5.446) GeV/c 2 for the sidebands. 
The right-hand side of Fig. [5] shows the J/ipK + K~ in- 
variant mass distribution with an additional lifetime re- 
quirement ct > 60 /xm on the B® candidate indicating 
the combinatorial background to be mainly prompt. 



E. Initial study of S-wave contribution 

Since this analysis considers a potential S-wave contri- 
bution, which is not coming from 4> — > K + K~ decays, to 
the B a a — > J /ipK + K~ signal in the description of the like- 
lihood function (see Se dVIj) . we perform an initial study 
to obtain an estimate of the size of the 5-wave contri- 
bution using a data sample corresponding to integrated 
luminosity of 3.8 fb _1 . We select <f) candidates in a larger 
K+K~ mass window 0.98 < m(K + K~) < 1.08 GeV/c 2 
and form B® candidates. Taking B° s candidates from the 
signal region defined above, we obtain the K + K~ invari- 
ant mass distribution shown in Fig. [51 Using a binned 
likelihood method, we fit the K + K~ invariant mass dis- 
tribution with a 4> signal component modeled by a tem- 
plate from B® — > J/ip(j> MC simulation allowing for a 
mass-dependent width consistent with the parameteri- 
zation in Eq. (IT71) (see Sec. IVI[) . The combinatoric back- 
ground is modeled by a histogram taken from the side- 
bands. An additional component, which takes into ac- 
count B° reflections where the pion from a B° decay 
is misidentified as a kaon, is obtained from an inclusive 
B° — > J /ipX MC simulation. The fractions of combina- 
torial background and B° reflection are fixed from a fit 
to the B® invariant mass, which prevents these compo- 
nents from absorbing a non-resonant component in the 
K + K~ mass distribution if present. Together with the 
(j) signal and the fixed components, an additional contri- 
bution is included in the fit allowing for a possible non- 
resonant S-w&ve contribution from B° s — > J/tpK + K~ or 
B a s — > J/i/'/o(980). This component is modeled cither 
flat in K + K~ mass or following a mass parameterization 
suggested in Ref. [44j. In either case the S- wave frac- 
tion is found to be compatible with zero as indicated in 
Fig. |6l From this study we do not expect a significant 
S'-wave contribution across the 4> mass region. 



V. FLAVOR TAGGING 

To maximize the sensitivity to the CP-violating phase 
Ps , we employ flavor tagging algorithms to deter- 
mine whether the reconstructed J/ip<f> candidate was a 
B° s meson or its anti-particle B® at the time of produc- 
tion. Flavor tagging algorithms assign each meson 
candidate a tagging decision and a tagging dilution. The 
tagging decision can be £ = +1 or — 1, corresponding to 



a B® meson at production or a B® meson, respectively. 
A value of £ = means the tagging algorithm failed and 
no tag is assigned. The tagging dilution T> is related 
to the probability that the tagging decision is correct, 
T> = 1 — 2pw, where pw is the probability of an incor- 
rect tag or mis-tag. In general, the dilution is obtained 
by counting the number of correctly and incorrectly as- 
signed tags V = (Nr - N W )/(N R + N w ), where N R 
is the number of correct tags and Nw is the number of 
incorrect tags. The flavor tagging performance is quan- 
tified by the product between the tagging efficiency and 
the squared dilution el? 2 . The tagging efficiency e is de- 
fined as the number of events that receive a tag, divided 
by the total number of events considered. We use two 
types of flavor tagging algorithms: a same side (SST) 
and an opposite side (OST) tagger. Due to the use of 
information from different event hemispheres, there is no 
overlap between both taggers by construction. In partic- 
ular, the SST algorithm considers only tracks within a 
cone of \/(A0) 2 + (At;) 2 < 0.7 around the J5° candidate 
while the OST tagger only includes tracks outside that 
cone, allowing us to treat SST and OST as uncorrelated 
tagging methods. 



A. Same side tagging 

In this analysis we employ a same side kaon tagging 
(SSKT) algorithm which uses the charge of the kaon pro- 
duced in association with the b (b) quark in the fragmen- 
tation process forming the B® (B®) meson as illustrated 
in Fig. [7] To determine the b (b) production flavor we at- 
tempt to find kaon tracks produced in the hadronization 
of the B® meson. The strangeness of the B° s meson pref- 
erentially produces associated kaons in the fragmentation 
process. The charges of these nearby kaons are correlated 
to the b quark content of the B® meson and provide an 
opportunity to identify the initial flavor of the B® meson. 
However, the B° s meson can also be accompanied by a 
neutral kaon which cannot be used to tag the B® flavor 
and therefore lowers the tagging power. Misidentifica- 
tion of the associated charged kaon leads to a further 
decrease of the tagging dilution. The SSKT algorithm 
was developed on a simulated high statistics Monte Carlo 
J5° sample, using the B® — > J/ip<p an d B° s ~^ de- 
cay modes. We use particle identification (dE/dx and 
time-of- flight) to help identify the associated track as a 
kaon 0,51 Ell- 



We calibrate the SSKT algorithm [47J by measuring 
the Bg mixing frequency on a data set corresponding 
to 5.2 fb _1 integrated luminosity. Using CDF's Sili- 
con Vertex Trigger [4q . we select events that contain 
B® — > D~n + candidates. The trigger configuration used 
to collect this heavy flavor data sample is described in 
Ref. gi]. Bg — > D s 7r + events are fully reconstructed in 
three D~ decay modes: D~ — > 07r~ with <fi —> K + K~ 
(5600 events), D~ K*°R- with K*° ->■ K+tt" 
(2760 events), and D7 7r"7r"7r+ (2650 events). We 
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Mass(JAi/ K + K ) [GeV/c 2 ] Mass(JA)/ K + K") [GeV/c 2 ] 



FIG. 5. J/ipK + K~ invariant mass distribution with a cut of 0.2 on the ANN discriminant (left) and in addition with a lifetime 
requirement ct > 60 fim on the B® candidate (right). The areas with vertical (horizontal) lines indicate the signal (sideband) 
regions used in Sec. IVII 
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FIG. 6. (color online). K + K~ invariant mass distribution 
with combinatorial background, B° reflection, and potential 
S-wave contribution. 
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FIG. 7. Illustration of b quark fragmentation into B° meson. 



also include the decay mode — > D s ir + ir + ir , with 
Dj 07T~ and K + K~ (1850 events). To illustrate 
this sample of candidates, the left-hand side of Fig. [5] 
shows the invariant mass of B° s — » D~ir + candidates with 
Dj — > (j)ir~ including background contributions. 

The calibration of the SSKT is achieved via an ampli- 
tude scan of the mixing frequency Am s . The probability 
for observing a meson in a B° s or B° s flavor eigenstate 
as a function of time is 



P{t) B Q g oc 1 ± AT> p cos Am s t, 



(10) 



where T> p is the event by event predicted dilution and 
A is a Fourier-like coefficient called "amplitude". The 
amplitude scan consists of a series of steps in which the 
mixing frequency Am s is fixed at values between zero and 
30 ps _1 . At each step, the likelihood function based on 
the above probability density function, is maximized and 
the best fit value of the amplitude parameter is deter- 
mined. Whenever the mixing frequency is fixed to values 
far from the true mixing frequency, the best fit value of 
the amplitude parameter is consistent with zero. On the 
contrary, when values of Am s close to the true B® mixing 
frequency are probed, the best fit value of the amplitude 
parameter is inconsistent with zero. If the dilution T> p , 
which is predicted on an event-by-event basis by the tag- 
ging algorithms, is correct, the amplitude A will be close 
to unity at the true value of Am s . Deviations from unity 
indicate that the predicted dilution has to be re-scaled 
by the actual value of the amplitude parameter at the 
amplitude maximum. This value of A is also called the 
dilution scale factor S-p- If the dilution scale factor is 
larger (smaller) than unity, the tagging algorithm under 
(over) estimates the predicted dilution. Multiplying the 
predicted SSKT dilution by Sv will then provide on av- 
erage the correct event-by-event dilution. 

The result of the Am s amplitude scan is shown in 
Fig. [51 Maximizing the likelihood as a function of Am s 
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FIG. 8. (color online). Left: Invariant mass of B° — > D^tv + candidates with DJ —¥ cj>iv~ and <f> — > K + K~ including background 
contributions. The pull distribution at the bottom shows the difference between data and fit value normalized to the data 
uncertainty. Right: The measured amplitude A and uncertainties versus the B® oscillation frequency Am B for the combined 
set of B° -> D~ir + and B° -¥ DJ 7T + 7r + 7r candidates. The sensitivity curve represents 1.645 a a, where 

the expected uncertainty on A for a given value of Am,. Note that the uncertainties shown are correlated between data points. 



measures the frequency of oscillations at Am s — 
17.79 ± 0.07 (stat) ps -1 in agreement with the world av- 
erage value of Am s Q. At this point the maximum 
amplitude is consistent with one and the measured di- 
lution scale factor for the SSKT algorithm is thus con- 
sistent with unity, indicating that the initial calibra- 
tion based on simulated events was accurate. We find 
S v = 0.94±0.15 (stat) ±0.13 (syst). This is the first time 
the SSKT dilution was calibrated using data only. We 
measure a tagging efficiency of e = (52.2 ±0.7)% and an 

average predicted dilution of < V 2 p > = (27.5 ± 0.3)%. 

The total tagging power is found to be eS-p 2 < V 2 > = 
(3.5 ±1.4)%. 

B. Opposite side tagging 

The opposite side tagger capitalizes on the fact that 
most b quarks produced in pp collisions originate from bb 
pairs. The b or b quark on the opposite side of the or 
Bg candidate hadronizes into a B hadron, whose flavor 
can be inferred using its decay products. 

The OST is a combination of several algorithms: the 
soft muon tagger (SMT) [42], the soft electron tag- 
ger (SET) [50|, and the jet charge tagger (JQT) |51j . 
The JQT combines all tracks from the fragmentation of 
the opposite-side b quark into a single jet charge mea- 



surement. The charge of the jet is determined by the 
momentum-weighted sum over the momenta p l T of all 
tracks in the jet Q jet = ^9^(1 + i& fc )/£iPr(l + 
PLu), where Qi = ±1 is the electric charge of track and 
P\ rk is the probability of the track being part of the b jet. 
Jets are reconstructed by a cone-clustering algorithm and 
separated into three classes, based on their probability of 
containing a b quark. A class 1 jet has a vertex displaced 
with respect to the primary pp interaction vertex, while 
a class 2 jet contains at least one track displaced with 
respect to the primary vertex. If there are no class 1 and 
2 jets found, the jet with the highest transverse momen- 
tum in the event is used. These jets constitute class 3 jets 
which can be identified for nearly 100% of the events. 

The lepton taggers, SMT and SET, utilize the charge 
of a muon or electron from a b — > c£~Dg transition to 
determine the production flavor of the parent B meson. 
Several variables used to identify electrons and muons 
are combined with a multivariate technique into a global 
likelihood to select the lepton candidates used in the tag- 
ging algorithms [H, H3| • 

The outcomes of the three OST algorithms are com- 
bined to give a single tag decision and predicted dilu- 
tion 0]. We optimize the total dilution by combining 
the taggers using an artificial neural network trained on 
data from semileptonic B — > ivX decays. The neural 
network handles correlations between the jet charge tag- 
ger and the lepton taggers and improves the tagging per- 
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TABLE II. Physics parameters of interest in the likelihood fit 
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FIG. 9. Measured versus predicted dilution for B + (left) and 
B~ candidates (right) from B — ¥ J/ipK ± events. 



formance by 15% relative to other combinations that do 
not account for correlations. 

We calibrate the OST performance on B + — > J/ipK + 
decays. Since charged B mesons do not oscillate, the 
production flavor of the B + is identified by the charge 
of the kaon daughter track. Knowing the true flavor of 
the B + meson as well as the flavor predicted by the OST 
algorithm, we can easily identify the dilution measured 
from the number of correct and false tags as a function 
of the predicted dilution. This dependence is shown for 
B + and B~ mesons in Fig. [§] and fitted with a linear 
function whose slope is expected to be one for a perfectly 
functioning OST algorithm. The actual measured slope 
of the linear fitting function provides the OST scale factor 
&d which is determined to be 0.93 ± 0.09 (1.12 ± 0.10) 
for B + (B~) mesons. 

Although the dilution scale factors determined sepa- 
rately from B + and B~ decays are both within uncer- 
tainties consistent with unity and with each other, we 
use two scale factors for the opposite side tagger, one for 
B + mesons and one for B~ mesons, in order to allow for 
any potential asymmetry in the tagging algorithms. As 
a cross-check we determine the scale factors in different 
data taking periods and find that the scale factors are 
stable throughout all parts of the data. We measure a 
tagging efficiency of e — (94.2 ± 0.4)% and an average 
predicted dilution from the B ± — > J/tpK^ signal events 

of J< VI > = (11.04 ± 0.18)%. The average OST di- 



lution scale factor is S v = 1.03 ± 0.06. The total OST 
tagging power is eS v 2 < X> 2 > = (1.2 ± 0.2)%. 



VI. THE LIKELIHOOD FUNCTION 

A simultaneous unbinned maximum likelihood fit to 
our data including information on the J/ipK + K~ in- 
variant mass, B® candidate decay time, and transver- 
sity angular variable p is performed to extract the main 
parameters of interest, fti^^ and Ar s , plus additional 
physics parameters, which include the B® meson mass, 
the mean B® width T s = H/t(B®), the polarization am- 
plitudes in the transversity basis |^4q(0)| 2 , |Aii(0)| 2 , and 
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and Ar s as main physics parameters. 
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transition probability at t = 


5± 


arg[ J 4 ± (0)A u (0)] 


h 


arg(^||(0)^S(0)] 


fsw 


Fraction of S-wecve in J/ipK + K~ sample 


Ssw 


Relative phase of 5*-wave contribution 



\A ± {Q)\ 2 = 1-|A (0)| 2 -|^||(0)| 2 , the corresponding CP- 
conserving strong phases Sn and S±, the fraction of the S- 
wave component and its corresponding phase Ssw- The 
likelihood function also includes other technical parame- 
ters of less interest referred to as "nuisance parameters" 
such as the B® signal fraction / s , parameters describ- 
ing the J/i/J(j> mass distribution, the B° s decay time plus 
angular distributions of background events, parameters 
used to describe the estimated decay time uncertainty 
distributions for signal and background events, scale fac- 
tors between the estimated decay time and mass uncer- 
tainties and their true uncertainties, as well as tagging 
dilution scale factors and efficiency and asymmetry pa- 
rameters. There are a total of 35 fit parameters in the 
likelihood function, 11 of which we consider parameters 
of physical interest with $ J J^ and Ar s being the main 
physics parameters. All the physics parameters are listed 
in Table HD 

The distributions of the transversity angular variables 
p= (cos 9t i 4>t : cos ipT ) observed with the CDF II detec- 
tor are different from the true distributions because the 
efficiency of detecting the final state muons and kaons 
from J/tp and <j) decays is non-uniform. The angular 
efficiency function is parametrized in three dimensions 
using a set of real spherical harmonics and Legendre 
polynomials as basis functions with ranges < i/jt < Tj 
< 9 T < n and < cj) T < 2tt [E3]: 

e^ T ,e T ,<h) =J2 a l™ P k(™STPT)Y lm (6 T ,<t>T)- (11) 

Irak 

The parameters af are obtained from simulated events 
where all transversity angles are generated flat for B® de- 
cays. These MC events, which have been passed through 
the full CDF II detector simulation, enable us to exam- 
ine how the initially flat distributions are sculpted by the 
detector acceptance, thus allowing us to determine the 
angle-dependent efficiencies of the reconstructed particle 



14 



candidates. There are no predictions for the distribution 
of the background transversity angles, but we find that 
they can be represented as the product of three indepen- 
dent functions of cos Ot, 4>t, and cosipT that are constant 
in time: 



/(cos0 T ) 
f(<h) = 

f(cOS1p T 



do — Q,i cos (Pt) 
2cio — 2ai/3 
I + 61 cos(2^ T + 6 ) 

c + ci cos 2 (V>t) 



2c + 2ci/3 



(12) 



The parameters 00,1, 60,1 and Co,i are determined as 
best fit estimates from the maximum likelihood optimiza- 
tion. The above functions follow closely the shapes of the 
angular efficiencies, which suggests that the underlying 
transversity angle distributions of the background events 
are flat. 

To set up the full unbinned maximum likelihood fit, 
we define a set of probability density functions (PDFs), 
P [x I p), which give the probability density of observing 
the measured variables aJj for an event i, given a set of un- 
known parameters /2. In our likelihood function the mea- 
sured variables Xi for each event i are the J/i/jK + K~ in- 
variant mass value m and its uncertainty a m , the B® can- 



didate proper decay length ct and uncertainty cr ct , the 
angular distributions of p = (cos9x,4 i t,cosiPt) m the 
transversity basis, and the predicted dilution T> p and tag 
decision £ for the SSKT and OST method as described 
in Sec. [V] Among the unknown fit parameters fl — (9, i>) 
are the physics parameters 9 described in Table [TT1 as well 
as the nuisance parameters V discussed above. 

The likelihood function for our dataset of N events is 
given as 



JV 



C{x\9,v) = \[P{x l \9,v). 



(13) 



We minimize 



N 



~ log C(x \9,v) = -J2 log P(x t I 9, v) (14) 

i=l 

using the MINUIT program package [53| . The likelihood 
function is composed of separate probability density func- 
tions for signal events, P s , and for background events, 
P b . Both the signal and background components con- 
tain PDFs describing the measured variables Xi of the 
B® candidate described above. 

The full likelihood function, including flavor tagging, 
can be expressed for signal and background events as 



N 

C = ]J[f s P s (m\a m ) ■ P s (0 ■ P s (9 T ,(fr,TpT,ct\a cU t,V p ) ■ P s (a ct ) ■ P S {V P ) 

i=l 

+ (1 - /.) • P b (m) ■ P b (0 ■ P b (ct\a ct ) ■ P b {9 T ) ■ P b (<f> T ) ■ P b {^j T ) ■ P b {a ct ) ■ P b (V p )} , (15) 



where the product runs over all N events in the data 
sample and f s and (1 — / s ) are the fraction of signal and 

I 



background events, respectively. For the case of the fit 
without flavor tagging, the likelihood function reduces to 



N 

£ notag = JJ^ . P s ( m | CTm ) . P s (9 T ,^ T ^ T ,ct\a ct ) ■ P s (<J ct ) 
i=l 

+ (1 - / s ) • P b (m) ■ P b (ct\a ct ) ■ P b {9 T ) ■ P b ^ T ) ■ P b {^ T ) ■ P b (a ct )} (16) 



which simply corresponds to the flavor tagged case with 
no tag decision (£ = 0) or a tagging dilution of zero 
(V p =0). 

In the following, we describe the individual elements of 
the full likelihood function in more detail starting with 
the signal mass PDF P s {m\a m )- We model the signal 
mass distribution with a Gaussian distribution of vari- 
able width. To form the probability density function, 
P s (m|cr m ), we use the candidate- by-candidate observed 
mass uncertainty cr m multiplied by a scale factor s m , 
which is a fit parameter and accounts for a collective mis- 



estimation of the mass uncertainties. The PDF is normal- 
ized over the range 5.2 < m(J/ipK + K^) < 5.6 GeV/c 2 . 
The background mass PDF, P b (m), is parametrized as a 
first order polynomial. Since the distributions of the de- 
cay time uncertainty a ct and the event-specific dilution 
V p are observed to be different in signal and background, 
we include their PDFs explicitly in the likelihood. The 
signal PDFs P s (a c t) and P s (T> p ) are determined from 
sideband-subtracted data distributions, while the back- 
ground PDFs P b (a ct ) and P b (D p ) are determined from 
the J/4>K + K~ invariant mass sidebands. The PDFs of 
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the decay time uncertainties, P s {<Jt) and P,(<7t), are de- 
scribed with a sum of normalized Gamma function dis- 
tributions as described below, while the dilution PDFs 
P s (2?) and P&(£>) are included in the likelihood as his- 
tograms that have been extracted from data. 

For the time and angular dependence of the signal 
PDF P s (9T,4 > T,^T,ct\a c t,^,T> p ), we follow the method 
derived in Ref. [521 . This PDF includes the additional 
contribution from S-wave B° -> J/ipK+K- decays, with 



fraction fsw and relative phase 5$w between the S-wave 
and P-wave amplitude. The major difference between 
our treatment and that of Ref. [52] is a refinement of the 
model used to describe the line shape of the K + K~ in- 
variant mass p = m(K + K~) [54j. As in Ref. [52| we use 
a flat model for the S'-wave component, whereas for the 
P-wave B® —> J / ijxj) component we instead use an asym- 
metric relativistic Breit-Wigner distribution with mass- 
dependent width 



\BW(p)f 



JL 



E(K+K~) 



(ml - p 2 ) 2 + '. 



1 tot 



(17) 



where E(4>) (K + K ) is the energy of the <j) (K + K ) 
in the decay of B° s -> J/^ (B° s -> J/ipK + K~). This 
treatment assumes a two-body decay, where the other 
daughter particle is the J/ip, and the total decay width 
r tot = Ti + r 2 + r 3 , where 2,3 are the partial de- 
cay widths for the decays <p -t K + K~ (48.8 ± 0.5%), 
->■ K° L K° S (34.2±0.4%), andi pir plus <f> ->■ tt+tt"^ 
(15.32 ±0.32%), respectively ||. Following Ref. [H we 



I 

describe the B° s decay rate as a function of the transver- 
sity angles, decay time and, in addition, the K + K~ in- 
variant mass. When both a <p component with kaons in 
a relative P-wave and a S'-wave component are present, 
the amplitudes must be summed and then squared. The 
P-wave amplitude has a resonant structure due to the 
<t> propagator, while the S-wave amplitude is flat, but can 
have an arbitrary phase Ssw with respect to the P-wave 



p B (6 T ,<l>T,'>pT,t,fx) 
p B (0 T ,<h,il>T,t,n) 



9 

167T 

9 

167T 



y/1 - f SW BW(p)A(t) + e l5s -v^^B(t) 

-Mm), 



y/l - f SW BW(p)A(t) + e iS ™ x/iW 6 (t) 



(18) 



In our analysis we accept events for which the recon- 
structed K + K~ mass p lies within a window p\ Q 
1 .00!) < p < p hi = 1.028 GeV/c 2 . The cj) mass distri- 
bution is described by the Breit-Wigner function given 
in Eq. (|17p . The S-wave mass distribution is given by 
the flat function h(p) = ^ between p\ and /ihi, where 
Ap = /ihi — Mio- The likelihood function used in the 
maximum likelihood fit is obtained by numerically inte- 
grating Eq. (|T5)) over the K + K~ invariant mass p. A(t) 
and A(t) are time-dependent complex vector functions 
describing the P-wave component in the transversity ba- 
sis. They are defined as 



with 



-r„t/2 



\Jt h +t l ± cos 2j3 s (t l - th) 

x [E+(t)±e 2 ^E^(t)\ a,, 
e -r s i/2 



\/t h +t l ± cos 2/3 s (tl - th) 

x [±E + (t) + e- 2ifS °E_(t)] ai , (19) 

where i e {0, |j , _L} and the upper sign applies to the CP- 
even final states (0 and ||), while the lower sign applies 
to the CP-odd final state (_L). Furthermore, 



E±(t) = \ 



,+ (- 



-y±e-(- 



(20) 



and the ai are complex amplitude parameters satisfying 



i . 



(21) 



. ^||(t)sin^ T .^_L(i)sin^T 
A(t) = \ Ao(t)cosipT, " t= ,t 



V2 



V2 



A(t) = L4o(*)cos^Tj 1 t= 1« 



V2 



V2 



The S-wave component is described in the transversity 
basis as 



B(t) = (B(t),Q,0) , 
B(t) = (B(t),0,0), 



(22) 
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where the time-dependent amplitudes are 



B(t) = 



,-r.t/a 



\/t h + t l - cos 2f3 s (t l - t h ) 



B(t) 



-T.t/2 



\Jt h + t l - cos 2f3 s (tl - t h ) 



x [-E+(t) 



E-{t)} 



(23) 



In Eq. (I18p the unit vector h is defined as h = 
(sin 9t cos 0t , sin 9t sin 4>t , cos #t ) in the transversity ba- 
sis and the strong phases 5\\ and 5± appear from terms 
of the form AiA* = |A i ||A)|e ars(A *' 4 o), where i ='\\' or 
i ='_L'. The decay width difference Ar s and the B® os- 
cillation frequency Am s are encoded in Eq. (|2H)) . 

The time-dependent functions in the PDFs above are 

I 



convolved with a resolution function composed of two 
independent Gaussians with candidate-by-candidate ex- 
pected width a ct . The widths of each Gaussian func- 
tion are multiplied by independent scale factors s c ti 
and s c t2 which are freely floated in the maximum like- 
lihood fit to account for an overall mis-estimation of 
the decay-time resolution. The PDF describing the 
decay time distributions for signal events as part of 
P s (0T,4>T,^pT,ct\a c t,£,,'Dp) in Eq. (TT5|) is of the form 

P s (ct\a ct ) = P s (ct, <J ct \cr, s cf i )2 ) 

= F(ct, ct) <S) G(ct, a ct \f Sctl , s ct i, s ct2 ), (24) 

where r = r(_B°) and F(ct,cr) represents the time de- 
pendence of the signal events which, e.g., for an expo- 
nential decay is given as e~~/(cr). The symbol "(g)" 
denotes a convolution which is with respect to the decay 
time resolution function defined as 



G(ct,a ct \f Sctl , Seti, Seta) — fs €tl 



1 



2-KS ct l<J c t 



c 2 t 2 

71 — 

3 ct2"ct 



2-KS ct 2(Tct 



(25) 



where f Sctl is the fraction of the first resolution Gaus- 
sian. From the distribution of decay time uncertainties, 
we find the average of the decay time resolution function 
at <j ct ~ 30 urn, with a root-mean-square deviation of 
about 12 /xm jl5j . 

As we are using candidate-by-candidate expected de- 
cay time uncertainties, which are not distributed identi- 
cally for the signal and background events, it is necessary 
to include a PDF for the separate uncertainty distribu- 
tions [56] . The PDF describing the decay time uncer- 
tainty P s (a c t) is constructed from normalized Gamma 
distributions 



c/b 



b a+1 T(a + 1)' 



(26) 



I 

where a and b define the mean and width of the distribu- 
tion. Each function has different values of a and b. We 
find these values from a separate lifetime-only fit before 
running the full angular analysis and fix these param- 
eters within uncertainties in the full likelihood fit. We 
handle the background lifetime resolution PDF Pb(<J c t) 
in the same way as the signal distribution, using a sum 
of three normalized Gamma distributions r norm . 



The portion of the likelihood function 
P s (9 T ,4>T,'4>T,ct\a cU i,V p ) related to tagging can 
be written as follows: 



P s (cb,ip T ,0T,4>T\ Oct ,"D S s, T>OS ,£ss,£os) = 



l + |£ss| l + |fos| 

1 — Csssss'Dss 1 — S.ossos'Dos 



i + ie 



ss| 



1 + 16 



OS I 



p s (ct, ipr, T , <t>r) ® G(ct, a ct ), (27) 



r 



where sss an d sos are the SSKT and OST dilution scale 
factors. Since the two flavor taggers search for tagging 
information (tracks, jets) in complementary regions in 
space, we treat them as independent tags. We have ver- 
ified that the tagging decisions and predicted dilutions 
are indeed independent. 



is dependent on the efficiency of each of the taggers 



p(0 = 



(l-e ss )(l-£ s) (Zss = 0, tos = 0) 

£ss(l - £os) (£ss = ±1, S,os = 0) 

(1 - e S s)£os (£ss = 0, £os = ±1) 

£ss£os (£ss = ±1, Cos = ±1) 



(28) 



The probability of a particular combined tag decision 



The predicted dilution distributions are different for sig- 
nal and background events. Therefore the likelihood 
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function in Eq. (j!5[) contains the corresponding dilution 
PDFs separately for signal, P s (V p ), and for background 
events, Pb(D p ). These dilution distributions are mea- 
sured with data. In the case of the signal, they are the 
sideband-subtracted distributions taken from the B® in- 
variant mass signal region and stored as histograms. The 
histograms are normalized to represent probability den- 
sities for the dilution. 

Knowledge of the fractions of positively and negatively 
charged events is sufficient to describe any tagging asym- 
metry present in the background. Therefore, the proba- 
bility density Pb(£,) is equal to the fraction of events with 
positive tags for £ = +1, and equal to the fraction of 
events with negative tags for £ = — 1. The background di- 
lution distributions are handled analogously to the signal 



dilution. The dilution distributions are taken from the 
B® invariant mass sideband region, normalized to form a 
probability density, and again stored as histograms. 

The background proper decay time PDF, P{,(ct\a ct ) is 
parametrized as a prompt peak modeled by a (5-function 
plus two positive exponentials and one negative exponen- 
tial. This function is convolved with the same resolution 
function as the signal decay time dependence. In our pa- 
rameterization, the prompt peak models the majority of 
the combinatorial background events, which are expected 
to have no significant lifetime, the negative exponentials 
defined for t > account for a small fraction of longer 
lived background such as other B hadron decays, and the 
positive exponential defined for t < takes into account 
events with a mis-reconstructed vertex. The background 
decay time PDF reads 



Pb (ct\a c t) = { f g S{ct) + (1 - f g ) 



G(ct, cr c t|/ Sctl , s ct i, s ct2 ), 



— + (!-/++) [f-^r 



+ (!-/-)■ 



(29) 



where f g is the fraction of the prompt background, A++ , 
A+ and A_ are the effective lifetimes of the background 
events distributed according to the long and short lived 
positive exponential as well as the negative exponential, 
respectively, while /++ and /_ are their corresponding 
fractions. 



VII. B% MEAN LIFETIME, DECAY WIDTH 
DIFFERENCE AND POLARIZATION 
FRACTIONS 

We use the likelihood function presented in the previ- 
ous section to extract measurements of the physics pa- 
rameters of interest. Before applying the unbinncd maxi- 
mum likelihood fit on data, we perform an extensive set of 
tests of the fitting procedure using simulated pseudoex- 
periments. We observe that, with the current statistics, 
the maximized likelihood function returns biased results 
for the parameters of interest. Moreover, the likelihood 
function shows non-Gaussian behavior with respect to 

the pi^^ and Ar s parameters. For these reasons we em- 
ploy frequentist techniques to determine confidence level 
(C.L.) regions in the fii -&T S plane as described in 
Sec. IVTTll 

However, we find that the likelihood function con- 
structed according to the standard model expectation, 
in which the CP-violation parameter fii is fixed to 
a value very close to zero (j8/^^ = 0.02), returns un- 
biased results for the mean B® lifetime t(B°), the decay 
width difference AT S , the polarization fractions |A||(0)| 2 
and |Ao(0)| 2 as well as the strong phase 6±. We also 



observe that the likelihood function is Gaussian with 
respect to all these parameters. Under these favorable 
circumstances we provide point estimates with Gaussian 
uncertainties for the following quantities: 

ct(B° s ) = 458.6 ± 7.6 (stat) ± 3.6 (syst) /xm, 

AT S = 0.075 ± 0.035 (stat) ± 0.006 (syst) ps" 1 , 
\A\\ (0)| 2 = 0.231 ± 0.014 (stat) ± 0.015 (syst), 
|A (0)| 2 = 0.524 ± 0.013 (stat) ± 0.015 (syst), 

5 ± = 2.95 ± 0.64 (stat) ± 0.07 (syst) rad. (30) 

The systematic uncertainties on the measured quanti- 
ties are discussed in detail in Sec. IVII Al and given above 
for completeness. We are unable to also quote a result 
for 5\i since the fit prefers a value of 8\\ at the boundary 
of 7r resulting in a non-Gaussian likelihood shape around 
the minimum. 

The results in Eq. 1301 s how good agreement with pre- 
vious measurements jl5l. [l6| . The B® mean lifetime 
can be calculated as t(B°) = 1.529 ± 0.025 (stat) ± 
0.012 (syst) ps, which is the most precise single measure- 
ment of this quantity. It can be compared to the most 
recent measurement from the DO collaboration using a 
data sample based on 8 fb _1 of integrated luminosity [191 ] 
quoting t(B°) = 1.443±o.o35 P s and to the Particle Data 
Group (PDG) average of t(B°) = 1.472l°;° 2 ^ ps [§]. The 
Ar s value is of comparable precision to the current world 
average of Ar s = 0.062^;^ ps" 1 (8|. Our central value 
is somewhat smaller than the most recent measurement 
of Ar s = 0.163+^064 P s_1 from the D0 collaboration [l| 
but compares well to the PDG average as well as the SM 
prediction Ar s = 0.090 ± 0.024 ps" 1 [§]. 
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In addition to comparing the fit results with predic- 
tions and other measurements, three cross-checks are per- 
formed using alternative versions of the fit. First we use 
the likelihood without flavor tagging to check for any bias 
which could be introduced by the tagging. The fit with- 
out flavor tagging does not have sensitivity to 6j_. This 
quantity is thus omitted from Table Mil for fits without 
flavor tagging. The second and third checks are done 
without the S- wave component included in the likelihood 
fit, once with flavor tagging included and once without. 
These checks are made to determine whether the S-wave 
component has a significant effect on the fit results and 
to provide a direct comparison with previous CDF re- 
sults [HI which did not account for the S'-wave compo- 
nent. Table Mil shows the results of these cross-checks, 
that demonstrate good agreement between different ver- 
sions of the fit and our main results which include both 
flavor tagging and the S'-wave component. 

Since the unbinned maximum likelihood method does 
not readily provide a goodness of fit estimator, we present 
fit projections onto the data to support the quality of 
the fit. The likelihood function, in which all parame- 
ters are fixed to their best-fit values, is overlaid on top 
of data distributions. Such projections are performed 
for both signal and background events and separately 
in the subspaces of the B® decay time, decay time ex- 
pected uncertainty and transversity angles. The fit pro- 
jections for the proper decay time and proper decay time 
uncertainties are shown in Figs. [TU] and [TT] Fit projec- 
tions for the transversity angles cos 9t, <Pt, and cost/V 
from the sideband-subtracted signal region and the back- 
ground (sideband) region are shown in Figs. and IT51 
respectively. The good agreement between the data and 
fit projections validates our parameterization of both the 
signal and background distributions plus their uncertain- 
ties. 



A. Systematic uncertainties 

To assess systematic uncertainties on quantities of in- 
terest other than /3s ^, namely Ar s , the B® mean life- 
time, the polarization fractions, and the strong phase 
5±, we set the CP- violating phase to its standard model 
expectation f3's^^ — 0.02. This choice in addition im- 
proves the statistical behavior of the likelihood function 
by eliminating biases on these parameters. 

Systematic uncertainties are assigned by considering 
several effects that are not accounted for in the like- 
lihood fit [57'] . Such effects include potential mis- 
parameterization in the fit model, impact of particular 
assumptions in the fit model, and physical effects which 
are not well known or fully incorporated into the model. 
To estimate the size of the systematic uncertainties, we 
generate two sets of pseudoexperiments by extracting 
random numbers distributed according to our PDF: one 
set with each of the considered systematic variations and 
another set of default pseudoexperiments. Each pair of 



modified and not modified pseudoexperiments are gener- 
ated with the same random seeds. The unbinned likeli- 
hood function is maximized over the modified pseudo- 
experiments as well as over the corresponding default 
ones. For each systematic effect, the associated uncer- 
tainty is the difference between the mean of the best fit 
value for the pseudoexperiments with the systematic al- 
teration included, and the equivalent mean value for the 
reference set of pseudoexperiments generated with the 
default model. The individual systematic uncertainties 
are summed in quadrature and presented in Table IIVI to 
give the total contribution to the uncertainties for each 
parameter which are due to sources of systematic uncer- 
tainty. 

One source of systematic uncertainty is the modeling of 
the angular efficiency of the detector described in Sec. lVIl 
We model the detector efficiency with a linear combi- 
nation of Legendre polynomials and spherical harmon- 
ics as described in detail in Ref. (52|. The expansion 
coefficients of these functions are obtained by fitting a 
three-dimensional efficiency distribution obtained using 
simulated events. This simulated sample is re-weighted 
to match the pr distributions observed in data. If the 
modeling is inaccurate, or the pr re-weighting incorrect, 
a systematic uncertainty could be introduced. We test 
these effects separately. The former effect is investigated 
by using the default fit model on pseudoexperiments, gen- 
erated with angular efficiencies taken directly from the 
background angular distributions rather than the default 
parameterization. The latter effect is investigated by 
generating pseudoexperiments with non-reweightcd MC 
events as input for the angular efficiencies. The second 
test is a rather extreme case, but shows only a small sys- 
tematic effect. 

The next systematic uncertainty that we consider is 
the modeling of the signal B® mass distribution, which 
is fitted by default with a single Gaussian distribution. 
If a double Gaussian model is used, the fit quality would 
be comparable and we evaluate a systematic uncertainty 
from the comparison of the two fit models. To test 
the size of a potential systematic effect, wc generate 
pseudoexperiments with a double-Gaussian signal mass 
model, extracted from data, and fit with the usual single- 
Gaussian parameterization. 

Similarly, the model used for the mass distribution of 
combinatorial background events can be changed to a 
second order polynomial with comparable fit quality. To 
study this effect, we generate pseudoexperiments with 
a second-order polynomial background model instead of 
the default first order polynomial and fit with the default 
straight line. The results of both tests and the corre- 
sponding systematic uncertainties are listed in Table IIVI 

A particularly important effect to consider for the 
lifetime measurement is the lifetime resolution model. 
In our standard fit we model the detector resolution 
by convolving each lifetime component with a two- 
Gaussian resolution function. To test the effect of a mis- 
parameterization, we generate pseudoexperiments with 
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FIG. 10. (color online). Proper decay time fit projections for the B® signal (left) and background (right) regions. The dashed 
distributions labeled as "Light" and "Heavy" indicate the contribution of the and Bf , respectively. The pull distributions 
at the bottom show the difference between data and fit value normalized to the data uncertainty. 
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FIG. 11. (color online). Proper decay time uncertainty fit projections for the B® signal (left) and background (right) regions. 
The pull distributions at the bottom show the difference between data and fit value normalized to the data uncertainty. 
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FIG. 12. Fit projections for transversity angles for sideband-subtracted signal. 
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TABLE III. Results of alternative fits to cross-check the main SM results. All uncertainties quoted are statistical only. As 
shown in Ref. [22J the untagged analysis is not sensitive to the strong phase <5j_. 

Parameter Un-tagged (with S-wave) Tagged (without 5*-wave) Un-tagged (without S-wave) 

ct(B°) H 456.9 ±8.0 459.1 ± 7.7 457.2 ± 7.9 

Ar s [ps _1 ] 0.069 ±0.030 0.073 ± 0.030 0.070 ± 0.040 

1^4,1 (0) | 2 0.232 ±0.032 0.232 ± 0.014 0.233 ± 0.016 

\A {0)\ 2 0.521 ± 0.013 0.523 ±0.012 0.520 ±0.013 
5 ± [rad] - 2.8 ± 0.6 
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FIG. 13. Fit projections for transversity angles in background region. 



TABLE IV. Summary of systematic uncertainties assigned to the five physics quantities discussed in Sec. I VIII 



Source of systematic effect Ar s [ps" 1 ] ct(B°) [fim] l^||(0)| 2 |^ (0)| 2 S± [rad] 

Signal efficiency 

Parameterization 0.0024 0.96 0.0076 0.008 0.016 



MC re-weighting 


0.0008 


0.94 


0.0129 


0.0129 


0.022 


Signal mass model 


0.0013 


0.26 


0.0009 


0.0011 


0.009 


Background mass model 


0.0009 


1.4 


0.0004 


0.0005 


0.004 


Resolution model 


0.0004 


0.69 


0.0002 


0.0003 


0.022 


Background lifetime model 


0.0036 


2.0 


0.0007 


0.0011 


0.058 


Background angular distribution 












Parameterization 


0.0002 


0.02 


0.0001 


0.0001 


0.001 


a ct correlation 


0.0002 


0.14 


0.0007 


0.0007 


0.006 


Non- factorization 


0.0001 


0.06 


0.0004 


0.0004 


0.003 


B° — > J/ipK* cross-feed 


0.0014 


0.24 


0.0007 


0.0010 


0.006 


SVX alignment 


0.0006 


2.0 


0.0001 


0.0001 


0.020 


Mass resolution 


0.0001 


0.58 


0.0004 


0.0004 


0.002 


<j ct modeling 


0.0012 


0.17 


0.0005 


0.0007 


0.013 


Pull bias 


0.0028 




0.0013 


0.0021 




Totals 


0.006 


3.6 


0.015 


0.015 


0.07 
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a three-Gaussian resolution model, extracted from data, 
and fit with the default two-Gaussian model. 

As well as the lifetime resolution, the modeling of the 
various components of the background lifetime can sys- 
tematically affect the measured lifetime. To check 
the effect of any inaccuracy in our background lifetime 
model as described in Section IVI1 we generate pseudoex- 
periments with the decay time of the background events 
taken from histograms of the B® mass sidebands and fit 
with the default model. 

We consider three possible sources of systematic un- 
certainty related to the transversity angles of the back- 
ground events: mis-modeling of the parameterization de- 
scribed in Section rVTl ignoring the observed small correla- 
tions between the three angles, and correlations between 
the angles and the expected proper decay time uncer- 
tainty, a ct . The effect of these sources of uncertainty 
is checked using the actual data distributions from the 
mass sidebands to generate pseudoexperiments and test 
the difference between our model and the true distribu- 
tions. For the parameterization check, we simply use the 
data background angular distributions in the generation 
of the pseudoexperiments before fitting with the default 
model. To check the effect of neglecting the small cor- 
relations between the angles, we generate pseudoexperi- 
ments where two of the background angles are sampled 
randomly from the data distributions, and the third one 
from a two-dimensional histogram according to the sam- 
pled value of the second angle. To check the effect of 
ignoring correlations between the transversity angles and 
a c t, we sample the 4>t angle distribution, found to have 
the largest correlation with <r ct , using a two-dimensional 
histogram of 4t versus a c t in order to generate the pseu- 
doexperiments. The effect of ignoring these very small 
correlations results in an almost negligible systematic un- 
certainty on the measurements (see Table HV]) . 

In the default fit, we do not account for contamina- 
tion from B° — > J/ipK*° events mis- reconstructed as 
Bg —> J/ip(f> decays (B° cross- feed). A small fraction 
of these events lies in the B® mass signal region. The 
first step in identifying the size of the systematic ef- 
fect is to estimate the size of this contribution by us- 
ing measured production fractions of the B° s and B° 
mesons, their relative decay rates to J '/tp<j> and J/t/)K*°, 
respectively, and the probability for each type of event to 
pass our final selection criteria when reconstructed under 
the Bg —> J i 'ip(f) hypothesis. Both the production frac- 
tions and the branching fractions are taken from Ref. Q . 
We estimate the efficiencies using simulation, with both 
B® — > J/ilxj) an d B° — > J/i/jK*° modes reconstructed as 
B® — » J/tp(f> decay. The fraction / of B° cross-feed events 
in the B° s sample is calculated as 

f(B° in B° s sample) = 

_ f(b^B°)B(B ^ J/^K*°)e(B°) 

f(b^B°)B(B°^J/^)e(B°) ■ 1 ] 

Using Eq. (|3ip , we find that the fraction of B° cross-feed 
into the signal sample of this analysis is (1.6 ± 0.6)%. To 



make a conservative estimate of the systematic uncer- 
tainty that this effect will add to the measurement of the 
parameters of interest, we generate pseudoexperiments 
with a fraction of 2.2% B° cross-feed, and fit with the de- 
fault model which does not account for this component. 
The cross-feed component is generated using values of 
the B° lifetime, decay width and transversity amplitudes 
from the CDF angular analysis of B° -> J/i/jK* 58]. 

A systematic uncertainty can be introduced by the as- 
sumption that the silicon detector is perfectly aligned, 
when it could actually be mis-aligned by bowing of the 
detector layers of up to 50 /im. A study on the effect 
of the limited knowledge of the CDF II silicon detector 
alignment concluded that a conservative estimation of the 
systematic uncertainty on the decay length cr in CDF 
lifetime measurements is given by a 2 /xm systematic un- 
certainty on cr [59j . This study was done by fully recon- 
structing both data and simulation under different silicon 
alignment assumptions, including shifts of ±50 /im in all 
silicon detector components. The lifetime was fitted in 
several B —> J/ipX channels, and the worst shift was 
taken as the systematic uncertainty on the lifetime due 
to the assumption of perfect silicon alignment. 

We use the value of 2 /zm systematic uncertainty on 
ct{B° s ) to also assess secondary effects on the other pa- 
rameters of interest. Due to correlations between the 
B® lifetime and the other physics parameters, it is ex- 
pected that an additional uncertainty on the lifetime 
measurement will also cause uncertainties in the mea- 
surement of the other parameters. To quantify the effect 
on the other parameters, we generate pseudoexperiments 
in which the decay time in each event is randomly shifted 
±2 ^m and fit in the usual manner to allow for compar- 
isons between the input and fitted values of the parame- 
ters of interest. 

In the fit model, we treat the mass resolution iden- 
tically for signal and background events. The effect of 
any inaccuracy in this assumption can be tested by gen- 
erating pseudoexperiments with mass uncertainty distri- 
butions modeled by histograms of B® sideband data for 
background events and sideband-subtracted signal region 
data for signal events separately, and then fitted with the 
default model. 

Finally, we consider the effect of mis-parameterization 
of the a c t distributions. To account for a possible ef- 
fect, we fit with the default model to pseudoexperiments 
generated with the uncertainty distributions taken from 
data histograms rather than the model described in Sec- 
tion [VTl This systematic check also accounts for any ef- 
fect caused by small observed correlations between cr ct 
and the invariant mass by sampling the background un- 
certainties from separate upper and lower sideband his- 
tograms according to the generated B s mass. 

The total systematic uncertainty assigned to Ar s is 
0.058 ps -1 , while 3.6 /im is assigned to the measurement 
of ct(B°). Both |^y(0)| 2 and \A (Q)\ 2 are assigned an 
uncertainty of 0.015. Finally, S± is assigned a 0.07 rad 
uncertainty. 
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VIII. 



FREQUENTIST ANALYSIS OF /3g /,/ " #> 



AND 



As anticipated in Sec. IVII1 due to the pathological be- 
havior of the likelihood function with respect to $ 3 J^ 
and Ar s , we perform a frequentist analysis to determine 
confidence regions for these parameters. In addition, we 
perform a cross-check in which we determine credible in- 
tervals for fil 1 ^ and AY s using Bayesian techniques de- 
scribed in Sec. IIXI In the main analysis we use profile- 
likelihood ratio ordering 60] to determine the confidence 

level region in the pi^^-AY s space. The coverage of 
the confidence region against deviations of the nuisance 
parameters from their measured values is confirmed by 
explicitly checking the effect of variations of the nuisance 
parameters on the profile-likelihood shape. Following our 
previous publication 15|, the confidence regions are cor- 
rected to guarantee a coverage of the true value with at 
least the nominal confidence level. We thus choose this 
method to quote the main results of this paper. 

The likelihood function has several symmetries that 
are discussed in detail in Ref. [H^]. In absence of the 
S'-wave component in B® — > J/ipK + K~ decays, the 
likelihood function is symmetric under the simultaneous 
transformations: fil 1 ^ -> § - /3 J/4 " 1 ', AT S -> -AT S , 
5± — > 7r — <5j_, and 5u — > S\\. An approximate symme- 
try is also present under the above simultaneous trans- 
formations of only f3's^^ and Ar s . The approximate 
symmetry produces a local minimum in the fii^^-AT s 
space in addition to the global minimum. We account 
for this effect by performing the likelihood scan with 8\\ 
being started in the fit separately in the range [0, 7r] and 
then in the range [tt, 1-k\ . At each point in the pi^^-AT s 
plane, we choose the deeper of the two —2 log£ likelihood 
values (absolute minimum) as evaluated for the different 
<5|| ranges. This procedure guarantees that we use the 
global, not the local minimum, at each point. 

Once we have minimized the likelihood function on 
data with respect to the $ J J^ and Ar s parameters, 
we proceed with determining the 68% and 95% con- 
fidence regions. Constructing correct and informative 
confidence regions from highly multi-dimensional likeli- 
hoods is challenging and, as in our case, evaluating the 
full 35-dimensional confidence space is computationally 
prohibitive. To construct a proper coverage adjustment, 
which ensures that the quoted 68% (95%) confidence lev- 
els do indeed contain the true values of $ J J^ and Ar s 
at least 68% (95%) of the time, the choice of the ordering 
algorithm is important. We choose the profile- likelihood 
ratio ordering method [6(J described below. The ob- 
tained profile-likelihood ratio is then used as a \ 2 variable 
to derive confidence regions in the two-dimensional space 
of $ J J^- AT S . However, simulations show that the ob- 
served profile-likelihood ratio deviates from a true \ 2 dis- 
tribution. In particular, the resulting confidence regions 
contain true values of the parameters of interest with 



lower probability than the nominal confidence level and, 
in addition, the profile- likelihood ratio appears to depend 
on the true values of the nuisance parameters, which are 
unknown. We therefore use a large number of pseudo- 
experiments to derive the actual profile-likelihood ratio 
distribution relevant for our data. The effect of system- 
atic uncertainties is accounted for by randomly sampling 
a limited number of points in the space of all nuisance 
parameters and using the most conservative of the re- 
sulting profile-likelihood ratio distributions to calculate 
the final confidence region. In the following, the coverage 
adjustment procedure is described in detail. 



A. Coverage adjustment 

To construct coverage adjusted confidence level regions 
for each point in the /3g -AT g plane, we start with cal- 
culating a p- value, which, given a certain hypothesis, de- 
scribes the probability to observe data as discrepant or 
more discrepant than the data observed in our experi- 
ment. The set of all points with a p-value larger than 
1 — x forms the x % C.L. region. In particular, the set of 
points with p- value larger than 1 — 0.95 = 0.05 outlines 
the 95% confidence region. 

Since a main goal of our analysis is to determine the 
compatibility of our data with the standard model expec- 
tation for fis ^, we start by calculating the SM p- value. 
We generate pseudoexperiments at the standard model 
expected point in the ^^-AT s plane (/3 S J/V ' = 0.02, 
AT S = 0.090 ps" 1 ). When generating the pseudoexper- 
iments, we use the best fit values for all nuisance pa- 
rameters as observed in our data while & J/,W and AT S 
are fixed to the SM expected values. The likelihood 
function corresponding to each pseudoexperiment is first 
maximized with all parameters floating, and then maxi- 
mized a second time with {5 J J^ and Ar s fixed to their 
SM values while the remaining fit parameters (nuisance 
parameters) are independently floating. We then form 
twice the negative difference between the logarithms of 
the likelihood values obtained in each of the two steps to 
obtain a profile-likelihood ratio value — 2Alog£. The 
profile-likelihood ratio distribution from 1000 pseudo- 
experiments is used to obtain the standard model p- 
value, which is the fraction of pseudoexperiments with 
— 2 A log £ larger than the corresponding quantity ob- 
served in data. 

We construct the cumulative distribution of — 2 A log£ 
to obtain a mapping between the p- value = 1 — C.L. and 
— 2Alog£, as shown in Fig. HU by the solid black his- 
togram which has been interpolated. In an ideal situa- 
tion, when the likelihood function is Gaussian with re- 
spect to ftj^^ and Ar s , this dependence should be a 
X 2 distribution with two degrees of freedom as indicated 
by the green line. It is evident from Fig. [H] that, at 
least with our current data sample size, we are not in an 
asymptotic, Gaussian regime. To test the dependence of 
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FIG. 14. (color online). Mapping of p- value (1 — CL.) as 
a function of twice the negative difference of log-likelihoods 
(— 2Alog£) as evaluated in pseudoexperiments. The ideal 
dependence is a x 2 distribution with two degrees of freedom 
as shown by the solid (green) line. The actual observed map- 
ping for our data is shown by the black histogram, while the 
corresponding distributions for the alternative ensembles are 
displayed by the colored, dashed histograms. 



the obtained mapping on the chosen SM point for 
and Ar s , we construct similar maps between the con- 
fidence level and — 2 A log £ for other random points in 



the 



?J/4><P 



AT S plane and find very similar dependencies. 



Consequently, we consider the mapping determined at 



and Ar s , we de- 



the SM point to apply for all points in the /3 S 
plane. 

To obtain confidence regions in fti^^ 
termine profile-likelihood ratios for a grid on the 
Ar s plane. In a Gaussian regime, the points with p- 
value = 0.05, corresponding to a confidence level of 95%, 
are identified by the intersection of the two-dimensional 
profile-likelihood function and a horizontal plane which 
is 5.99 units above the global minimum. The value 5.99 
is the point on the — 2 A log £ axis where the x 2 distribu- 
tion with two degrees of freedom (green line) intersects 
the 1 - 0.95 = 0.05 level (red dashed line) in Fig. EU 
The 68% CL. is correspondingly obtained by the top 
horizontal (blue) line. The intersection between the 0.05 
level and the actual mapping (black histogram) is at 
— 2Alog£ = 7.34 which means that the 95% confidence 
region is obtained by taking the intersection of the two- 
dimensional profile-likelihood function and a horizontal 
plane which is 7.34 units above the global minimum. In 
this case we find the standard model p-value for @i 
to be 0.27. Clearly, this procedure leads to confidence 



regions larger than in the ideal, Gaussian case. 

In order to guarantee additional coverage over a con- 
servative range of possible values of nuisance parame- 
ters, sixteen alternative ensembles are generated. As 
we do not know the true values for these nuisance pa- 
rameters, we compute the coverage over a wide range 
of possible values but always within their physically al- 
lowed range 61[. In particular, each alternative ensem- 
ble is produced by generating pseudoexperiments with 
nuisance parameters randomized uniformly within ±5 a 
of their best fit values as obtained from maximizing the 
likelihood function on data. In these pseudoexperiments, 
the parameters fti and Ar s are again fixed to their 
standard model expectation. We choose a random vari- 
ation of ±5 a over the nuisance parameters because we 
aim to cover the space of nuisance parameters with a 
CL. much larger than the anticipated CL. for our final 
result. Exceptions to this approach are the strong phases 
which are generated only within the range from zero to 
27T and the dilution scale factors which are generated so 
that the dilution is always between zero and one. The 
other exception to applying a ±5 a range is the phase 
$sw, which is generated flat between and 2ir. Since 
the S-w&ve fraction f$w is consistent with zero as dis- 
cussed in Sec. lVIIIBl we lack sensitivity to the associated 
phase and choose to vary it over its full range possible. 

To determine this additional coverage adjustment, 
we again generate 1000 pseudoexperiments for each of 
the sixteen alternative ensembles. The same profile- 
likelihood ratio procedure is performed on each ensemble, 
and the broadest and thus most conservative p-value is 
taken to form the final confidence regions. The colored, 
dashed lines in Fig. [TJ] show the resulting mappings be- 
tween 1 — CL. and — 2A ln£ for each of the sixteen alter- 
native ensembles. We use the p- value of the most conser- 
vative ensemble to determine the corresponding 68% and 
95% confidence regions for our data. The intersection be- 
tween the 0.05 (0.32) confidence level (1 — CL.) and the 
most conservative mapping is at — 2A log£ = 8.79 (4.19). 
This means that the 95% (68%) confidence region with 
guaranteed coverage is obtained by taking the intersec- 
tion of the two-dimensional profile-likelihood ratio and 
a horizontal plane which is 8.79 (4.19) units above the 
global minimum. Note that this procedure of randomly 
sampling a limited number of points in the space of all 
nuisance parameters and using the most conservative of 
the resulting profile-likelihood ratio distributions auto- 
matically accounts for the effect of systematic uncertain- 
ties. 

A similar coverage adjustment procedure is carried out 
to determine individual confidence intervals for /?/ ^ 
and Ar s separately. When determining the /3s ^ con- 
fidence interval, AT S is randomized in the pseudoexperi- 
ment generation and treated analogously with the other 
nuisance parameters. We again generate 1000 pseudoex- 
periments per alternative ensemble for the final coverage 
adjustment in the one-dimensional case. Using a similar 
approach as in the two-dimensional case, we obtain again 
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plane for the fit without application of flavor tagging. The 
solid (blue) and dot-dashed (red) contours show the 68% and 
95% confidence regions, respectively. The dotted lines are the 
symmetry axes corresponding to the profiled likelihood invari- 
ance under fil 1 ^ -> § - fil 1 ^ and Ar s -» -AIY In addi- 



tion, the likelihood is invariant under j3 



j/4><fi 



-/3. 



The 



shaded (green) band is the theoretical prediction of mixing- 
induced CP violation. 



mappings of 1 — CL. versus —2 A log C for alternative en- 
sembles with randomized nuisance parameters. 



B. Results using frequentist approach 

We present frequentist /3'i^^-AT s confidence regions 
and p-values obtained according to the procedure de- 
scribed in Sec lVIIll above. The fii^^-AT s confidence re- 
gions without the application of flavor tagging are shown 
in Fig. [15j The SM prediction is indicated by the black 
marker, and the 68% and 95% CL. regions are shown as 
solid (blue) and dot-dashed (red) contours, respectively. 
We find the p- value for f$ 3 J^ to agree with the standard 
model prediction to be 0.10. The shaded (green) band 
is the theoretical prediction of mixing-induced CP vio- 
lation. As discussed above, in the absence of an S-wave 
component, the likelihood function is symmetric under 
the simultaneous transformations {3s ^ — > § — Ps , 
Ar s -Ar s , 6± -> n-S± and<5|| -S\\. In addition, if 
no flavor tagging information is used, an additional sym- 
metry is present in the likelihood — > —fil^^. As a 
consequence of these symmetries, the likelihood function 
has four global maxima as can be seen in Fig. 1151 

Once the flavor tagging information is added to the 



plane for the fit including flavor tagging information. The 
solid (blue) and dot-dashed (red) contours show the 68% and 
95% confidence regions, respectively. The dotted lines are 
the symmetry axes corresponding to the profiled likelihood 
invariance under -> f -/3 S 7/ ^ and Ar s -> -Ar s . The 

shaded (green) band is the theoretical prediction of mixing- 
induced CP violation. 



P'i symmetry is removed 



analysis, the p s ' T r — > 
and the likelihood function has only two global max- 
ima corresponding to the likelihood invariance under 



pj/M _> * _ and AT S -»■ -AT,. The {3 J S 

AT S confidence regions for the flavor tagged analysis, 
after coverage adjustment, are shown in Fig. 1161 Our 
sensitivity to ^ ^ and Ar s has substantially improved 
compared to our previously published measurement fill ] , 
as evidenced by the decrease in size of the confidence re- 
gion. The result is also more consistent with the standard 
model prediction. 

To illustrate the effect of the coverage adjustment, the 
left-hand side of Fig. [T7] compares the 68% and 95% 
CL. contours after coverage adjustment with the cor- 
responding contours before the coverage adjustment pro- 
cedure. A small increase in the size of the contours can 
be seen. As a further cross check, we also performed 
the same fit setting the S'-wave fraction to zero as shown 
on the right-hand side of Fig. [T7] The contour regions 
corresponding to a profile-likelihood ratio variation of 
-2Alog£ = 2.30 (blue) and -2 A log £ = 5.99 (red) 
are compared when including (solid) and not including 
(dashed) the S'-wave fraction in the likelihood fit. The 
contours are almost identical. 

The one-dimensional likelihood scan in the quantity 



P. 



J/i>4> 



after coverage adjustment is shown in Fig. (TS] on 



the left-hand side. In a Gaussian scenario the 68% (95%) 
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FIG. 17. (color online). Left: Confidence regions in pi^^-AT s plane for the fit including flavor tagging information before 
(dashed) and after (solid) performing the coverage adjustment. Right: Comparison of including (solid) and not including 
(dashed) the 5- wave contribution in the likelihood fit. 
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CL. range is between the points of intersection of the 
profile-likelihood scan curve and a horizontal line which 
is one unit (four units) above the global minimum. In 
our case after coverage adjustment the solid (blue) and 



dot-dashed (red) horizontal lines which indicate the 68% 
and 95% CL. ranges are at 2.74 and 7.11 units above the 
global minimum, respectively. We obtain 



Pj'^ G [0.02,0.52] U [1.08, 1.55] at 68% confidence level, 

G [-tt/2,-1.46] U [-0.11, 0.65] U [0.91, vr/2] at 95% confidence level. 



We find the standard model p- value for /3 S to be 0.30 
corresponding to about one Gaussian standard deviation 
from the SM expectation as is also evidenced in Fig. [TfJ] 
In comparison with the recent measurement of fi J J^^ 
from the DO collaboration using a data sample based on 
8 fb _1 of integrated luminosity [lj[, we find a similar 

region to constrain /3g ^ at the 68% CL. and obtain 
a similar p-value for comparison with the SM expecta- 
tion. However, our result constrains /3s ^ to a narrower 
region at the 95% confidence level. 

In addition, we quote a confidence interval for the 
S-wave fraction after performing a likelihood scan for 
fsw as shown in Fig. [19] We also show a quadratic 
fit overlaid indicating the parabolic shape of the likeli- 
hood around the minimum which we integrate to cal- 
culate upper limits on the S-wave fraction. The up- 
per limit on the S-wave fraction over the mass interval 
1.009 < m{K+K-) < 1.028 GeV/c 2 corresponding to 
the selected K + K~ signal region is 4% of the total signal 



at the 68% confidence level, and fsw < 6% at 95% CL. 
Since the analysis is limited to events in a narrow K + K~ 
mass range around the <f> signal, the observed S-wave 
fraction is small and its effect on the observables quoted 
in this analysis is minor. We verified with pseudoexperi- 
ments that a sizeable amount of S-wave would affect the 
measured value of fi J J^"^ '. In contrast to our result, the 
recent DO publication [lj| quotes a sizeable fraction of 
17.3 ±3.6% for the S-wave fraction over almost the same 
K + K~ mass range. We also perform a likelihood scan 
to determine the associated S-wave phase, but, as ex- 
pected from simulated experiments, we find that we are 
not sensitive to Ssw with the current data sample size. 

Finally, we perform a flavor tagged analysis with Ar s 
Gaussian constrained to the theoretical prediction of 
2|r? 2 | = (0.090 ±0.024) ps- 1 [9j. Under this constraint, 

/3/ /V>0 is found in the range [0.05,0.40] U [1.17,1.49] at 
the 68% confidence level, and within [— ir/2, — 1.51] U 
[-0.07, 0.54] U [1.03, tt/2] at 95% CL. as shown in Fig.rjJ 
on the right-hand side. The p-value for the SM expected 
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FIG. 18. Likelihood scan in fii^^ with no constraint (left) and with Ar s constrained to the SM prediction (right). After 
coverage adjustment the solid (blue) and dot-dashed (red) horizontal lines indicate the 68% (95%) CL. range above the global 



minimum. 




S-wave fraction 

FIG. 19. Likelihood scan of the S-wave fraction with a 
quadratic fit overlaid indicating the parabolic shape of the 
likelihood around the minimum. 



value of p s from this constrained fit is 0.21, corre- 
sponding to a deviation from the SM expectation of 1.3 a 
significance. We note that the likelihood scans in Fig. [18] 
exhibit small deviations from the symmetry in /3s^^ that 
is expected according to our discussion above. The rea- 



son is given by the small .S-wave fraction that our like- 
lihood fit finds as well as the choice of binning and nu- 
merical precision in determining the displayed — 2 A log£ 
values. 



IX. RESULTS ON f3 J s /VJtp AND Ar s IN A 
BAYESIAN APPROACH 

In addition to the frequentist results shown in the pre- 
vious section, we use a Bayesian analysis to provide cross- 
checks on the determination of the physics parameters. 
We use Bayesian inference via integration of the posterior 
density obtained from the likelihood function described 
in Sec. I VII over the nuisance parameters and over those 
physics parameters in which we are not presently inter- 
ested. 

The starting point for this Bayesian approach is the 
likelihood function, C(x\6, u), where x are the exper- 
imental measurements including the B® candidate de- 
cay time and invariant mass, the transversity angles and 
tagging information, while fl = (9,v) is a vector distin- 
guishing the physics parameters 8 described in Table Hi] 
from the remaining nuisance parameters v in the fit de- 
scribing features such as background shapes, tagging per- 
formance, and detector resolution (see Sec. IVI|) . In our 
analysis the dimensionality of 9 and v is 11 and 24, re- 
spectively. 

Within the Bayesian approach to statistical inference, 
Bayes' theorem defines the posterior probability density 
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given the observed data set x 



p(6 | x) 



p{x\0)p{0) 
J p{x\Q)p{6) d N 9 



(32) 



where p(x \ 9 ) is the likelihood function C(x\9) and p(8 ) 
is the prior probability density for 9, which describes the 
knowledge about parameters 9 that we assume prior to 
our measurement. The projection of the iV-dimensional 
posterior density onto M parameters of physical interest 
corresponds to an integration over all the other N—M pa- 
rameters, where the limits of integration cover all possible 
values for the N — M parameters. This projection gives 
a new posterior density p(d \ x) for parameters on G 9. 
For a single variable a and a uniform prior density this 
expression reduces to 



p(a \x) = p(9\ x) (9) d 



N-l, 



(33) 



We compute a representation of the 35-dimensional 
likelihood function including nuisance parameters, which 
we store in a computer- readable data format (Ntuple). 
These Ntuples contain Markov chains, which have been 
generated by a Markov chain Monte Carlo (MCMC) tech- 
nique [62] . Projections of the Markov chains onto sub- 
spaces of physical parameters of interest, in particular 
/3s ^ , Ar s , etc., are then performed. In addition, we 
compute credible intervals for certain parameters and 
credible contours for pairs of parameters derived from 
these projections, as discussed in more detail below. 

From a Markov chain projection one may easily draw 
conclusions about specific values of parameters such as 
Pi 1 ^ 4, and Ar s with a view toward propagating that 
information into global fits of, e.g., CKM parameters 
and incorporating certain prior information, about, e.g., 
mixing- induced CP violation and the values of the strong 
phases that appear in B® — > J/^p4> decays. By projecting 
a Markov chain onto a subspace of parameters of dimen- 
sion M, i.e., making a histogram or a scatter-plot from 
the Ntuples, one is in fact performing a numerical inte- 
gration of the posterior density, over the other N — M 
parameters. The normalization factor, i.e., the denomi- 
nator in Eq. Q32p. is easily identified as the total number 
of points of the Markov Chain. 



We can define a credible interval [az,,ct u ] for the pa- 
rameter a with probability content (3 through 



/ p(a | x) da = P(ccl < a < 

Jar, 



(34) 



The credible interval [aL,ct u ] contains a fraction (3 of 
the posterior density about a but it is not unique. 
However, we can build a "shortest interval" using the 
straightforward algorithm of maximum probability or- 
dering by accepting into the interval the largest values 
of the PDF p(a\x). Using the same algorithm as in 
the one-dimensional case, we build credible contours in 
the /3s ^-AT S plane. A credible interval (or contour) 
does not necessarily cover the true value of a parame- 
ter (or parameters) with any given frequentist probabil- 
ity. On the other hand, regarding the technical aspect 
it allows for the combination of experimental results and 
theoretical inputs in a straightforward manner. In ad- 
dition, the Bayesian technique can be trivially modified 
to incorporate other conditions on, for example, mixing- 
induced CP violation or constraints on the strong phase 
angles through non-uniform prior probability densities 
(see Sec. |lXB]and Sec. |IXB|. 

To verify convergence of the Markov chain, we gen- 
erate sixteen independent chains. A burn-in phase of 
approximately 10 000 steps is identified. We discard the 
first 250000 states in each chain and keep the following 
one million states. This means the probability densities 
shown below are based on sixteen million states. 



A. Results using Bayesian approach 

The projections of the sixteen chains onto the variables 
(3s^ 4 and Ar s are displayed in Fig. [50] together with the 
sum of all sixteen chains. The close agreement between 
the sixteen independent chains on the left-hand side of 
Fig. [50] also demonstrates the convergence of the com- 
putation. Using this Bayesian analysis of the data, we 
obtain 



Pl 1 ^ <E [0.11 , 0.41] U [1. 16, 1.47] as 68% credible interval, 
€ [-0.04, 0.59] U [0.98, 1.62] as 95% credible interval, 
Ar s € [-0.14 ps~\-0.06 ps _1 ] U [0.06 ps~\0.14 ps" 1 ] as 68% credible interval, 
G [-0.18 ps" 1 ,-0.02 ps _1 ] U [0.02 ps~\0.18 ps" 1 ] as 95% credible interval. 

I 



The joint posterior probability densities and credible The narrow band shown in the figure is the theoretical 
contours in the P^^-ATs plane are displayed in Fig. [21] prediction of mixing-induced CP violation using 2|rf 2 | = 
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0.2 





FIG. 20. (color online). Bayesian posterior densities for the variables f3 a (top) and Ar s (bottom). The left plots show 
projections of sixteen independent Markov Chains, while the right two plots show the posterior densities with 68% and 95% 
credible intervals in dark-solid (blue) and light-solid (red) areas, respectively. 
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FIG. 21. (color online). Joint posterior probability density in 
the Ps^^-AT 3 plane for the combined analysis. The dark- 
solid (blue) and light-solid (red) contours show the 68% and 
95% credible regions, respectively. The narrow band (green) 
is the theoretical prediction of mixing-induced CP violation. 



where Ap, = {/3, /w ;p(/# < p(Pf M )}, indicating 
that, as the contour is expanded to enclose larger credibil- 
ity, the standard model prediction j3f M is first included 
within the enclosed region at a credibility of 88.1%. 

We also examine the posterior density in the variables 
<5|| versus 5± as shown in Fig. [521 It is predicted that 
the phases in B® — > J/i/j4> match those in the equiv- 
alent decay B° -> J/ipK*° to within 10° [63]. The 
measured values of these phases in the B° — > J/ijjK*° 
decay (6\\ = -2.93 ± 0.08 (stat) ± 0.04 (syst) rad and 
5± = 2.91 ± 0.05 (stat) ± 0.03 (syst) rad Q) are over- 
laid in form of a green ellipse in Fig. [2H The width of 
this ellipse includes the 10° theoretical uncertainty added 
in quadrature with the experimental uncertainties on 8\\ 
and S±. For one mode of the probability density good 
agreement between the B° and B° system is observed as 
predicted in Ref. [63j]. 



B. Constrained results 



(0.090 ± 0.024) ps" 1 [1 [T3] overlapping with our result. 
Furthermore, we calculate 

/ PiPi 1 **) dpi'^ = 0.119, (35) 



Figure (2D shows that our measurement in the @ s - 
Ar s plane is consistent with the hypothesis of mixing- 
induced CP violation as well as with the hypothesis that 
the measured CP violation originates from the standard 
model. We apply the hypothesis of mixing-induced CP 
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95% credibility 
68% credibility 
Prediction 

Phys. Lett. B669 321 



&, [rad] 



TABLE V. Summary of the sensitivity study. The 68% cred- 
ibility interval on fil^^ is given for the unconstrained result 
and when 2 |F 12 | is constrained to its SM prediction. 



Variation 



Constrained Unconstrained 



Default [0.09,0.32] [0.11,0.41] 

Flat sin 2/3s / ^ [0.08,0.31] [0.09,0.37] 

Flat cos 8 X [0.09,0.33] [0.10,0.43] 

Flat cos [0.09,0.32] [0.11,0.41] 

Previous three together [0.07,0.31] [0.09,0.39] 

Flat in amplitudes [0.09,0.32] [0.11,0.41] 

Gaussian mix.-ind. CP viol. [0.09,0.34] 



FIG. 22. (color online). Posterior density in the strong phases 
S± versus 5u overlaid with the prediction that the phases in 
Bg — > J/t/j(j) match those in B° — > J/ipK°* decays to within 
10° [63j]. The dark-solid (blue) and light-solid (red) contours 
show the 68% and 95% credible regions, respectively. The 
width of the light-shaded (green) ellipse includes the 10° the- 
oretical uncertainty added in quadrature with the experimen- 
tal uncertainties on <5|| and S± from B° — > J/ipK°* . 



violation, together with the theoretical calculation of Tf 2 , 
to our data in form of a prior density during the com- 
putation of the MCMC. We carry out this calculation 
by simply re-weighting the Markov chains using a new 
fiat prior density derived from the theoretical calculation 
which gives 2 |rf 2 | = (0.090 ± 0.024) ps" 1 [§]. 

The 68% credible interval on the CP-violating quantity 
fil 1 ^ is fil 1 ^ G [0.09,0.32] U [1.24, 1.48]. The posterior 

density in pi ^ alone is shown in Fig. [23] on the left- 
hand side. Again, we calculate the quantity 



piPi 1 ^) 4,$*+ = 0.131, 



(36) 



qSM 



)}. The SM is 



where A 0S = {# /w ;p(#' w ) < p(# 
first included at a credibility of 86.9%. 

As can be seen in Fig. [521 the theoretical predictions 
of Ref. [§3[ are consistent with one of the modes of the 
probability density in the <5j^ versus 8\\ plane but not 
with the other mode. In the following, conditional pos- 
terior densities are used to show that the favored mode 
of probability from Fig. [M] corresponds to the solution 
/3 S W e [0.09,0.32]. 

The sequence of plots in Fig. [5U illustrates this state- 
ment. The figures in the top row show the condi- 
tional posterior density in the parameters 5\\ and S± af- 
ter imposing the requirement |/3/ | < 7r/4. It can be 
seen that this condition completely eliminates one of the 
modes of the probability density in the 5u versus S± pa- 
rameter space. The plots in the bottom row of Fig. |2~41 
show that, conversely, the condition ir/2 < S± < 3ir/2 
eliminates one mode of the probability projected onto 



while there is virtually no impact on the other 



These conditional probabilities allow us to visualize 
what is happening in the larger space of all four pa- 



J/i>4> 



AT 



rameters (f3, 
prediction of Ref. [63j 
Pi'^, namely 



,, <5|| and S±), and to identify the 
with only one of the solutions for 
G [0.09, 0.32] at the 68% credible 



mode with |/9/ /V "*| < tt/4. 



interval. This result confirms the early finding in our pre- 
vious publication [l5[ which indicated that the solution 
centered in < fti'^ < 7r/4 and Ar s > corresponds 
to cos(<5jJ < and cos(<5j_ — Sn) > 0, while the opposite 

is true for the solution centered in ir /4 < pi'^ < tt/2 
and Ar„ < 0. 



C. Sensitivity analysis 

It is a well-known fact that Bayesian results depend on 
the chosen prior densities; a fiat prior in a given metric 
might, in general, not be flat in another metric. To study 
such effects, we carried out a sensitivity analysis in order 
to characterize the degree to which the Bayesian results 
of this section depends upon the chosen input priors. The 
sensitivity analysis was performed by weighting the prob- 
ability density with the Jacobian of the transformation 
from the default parameterization to the desired parame- 
terization. Using this technique we checked the variation 
of the Bayesian result with respect to the following six 
variations. First, the prior is taken flat in sin 2/?/^^ 
rather than flat in /j/'^; second, the prior is taken flat 
in cos<5_i_, and third, the prior is taken flat in cos^ii. Af- 
terwards, all three conditions are applied together at the 
same time. Fifth, the prior is taken flat in the amplitudes 
^4||(0) and ^4_l(0) rather than in their squares and finally, 
the mixing-induced CP violation constraint is taken as a 
Gaussian rather than flat constraint. The effect of chang- 
ing the priors on the 68% credibility intervals on f3 J J^^ 
is summarized in Table FVl Modest changes are observed 
for the unconstrained result and the result with |rf 2 | con- 
strained to 2 |rf 2 | = (0.090 ± 0.024) ps" 1 0. Only the 

effect on the first /?/' ^ credibility interval is shown, since 
the second interval can be trivially derived from the num- 
bers in Table [Vl 
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FIG. 23. (color online). Two-dimensional posterior densities for the variables fti^^ and AF S (left), and one- dimensional 
posterior density for the variable /3g (right) including a prior density for mixing-induced CP violation. The dark-solid 
(blue) and light-solid (red) contours show the 68% and 95% credible regions, respectively. 
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FIG. 24. (color online). Conditional posterior densities for 8± versus 3^ and for fti^^ . The extra conditions applied to p J s 
(top row) and S± (bottom row) are shown on the left, and the resulting conditional probabilities are displayed on the right. 
The theoretical prediction of Ref. [63j is indicated as green ellipse in the bottom left plot. All plots in this figure are subject 
to the constraint of mixing-induced CP violation. The dark-solid (blue) and light-solid (red) contours show the 68% and 95% 
credible regions, respectively. 
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X. SUMMARY 



In summary, we have presented a measurement of 
CP violation in B® —> J/ip(j) decays using 6500 signal 
events from a data sample with 5.2 fb _1 integrated lu- 



minosity collected with the CDF II detector operating 
at the Tevatron pp collider. We find the CP- violating 
phase to be within the range /3 S J/,W e [0.02, 0.52] U 

[1.08, 1.55] at 68% confidence level and within the inter- 
val [-71-/2, -1.46] U [-0.11, 0.65] U [0.91, tt/2] at 95% C.L. 
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where the coverage property of the quoted interval is 
guaranteed using a frequentist statistical analysis. As- 
suming the standard model expectation for /3s , the 
probability to observe a fluctuation as large as in our 
data or larger is given by a p-value of 0.30 correspond- 
ing to about one Gaussian standard deviation. This re- 
sult shows less of a discrepancy with the SM expectation 
than our previously published result using 1.3 fb _1 of 
integrated luminosity |15| . In comparison with the re- 
cent measurement of /3s from the DO collaboration 
using a data sample based on 8 fb _1 of integrated lu- 
minosity [19J, our result agrees within uncertainty but 
constrains /3s to a narrower region. With AT S con- 
strained to its SM prediction, we find /3s in the range 
[0.05,0.40] U [1.17,1.49] at the 68% C.L. The measure- 
ment of the CP-violating phase /3s is still statistics- 
limited. It will improve with the final 2011 CDF dataset 
approximately doubling the current integrated luminos- 
ity. 

This analysis also incorporates the possibility of con- 
tributions to the Bg — > J /ipK + K~ final state in the 
region of the <fi resonance from B® — > J/ipfo and 
B® — » J/ipK + K~ (non-resonant) decays. We measure 
the iS-wave contribution over the mass interval 1.009 < 
m(K + K~) < 1.028 GeV/c 2 corresponding to the se- 
lected K + K~ signal region to be less than 6% (4%) at 
the 95% (68%) confidence level. We do not confirm a 
sizeable fraction of 17.3 ± 3.6% for the iS-wave fraction 
as quoted over almost the same K + K~ mass range in a 
recent DO publication |19j . 

Assuming the standard model prediction for the CP- 
violating phase /3s^^, we measure several other param- 
eters describing the B® system. These include the B° 



mean lifetime t(B®), the decay width difference Ar s 
between the heavy and light B° mass eigenstates, the 
transversity amplitudes |A||(0)| 2 and |Ao(0)| 2 , as well as 
the strong phase S±. The measurements for t(B®) = 
1.529 ± 0.025 (stat) ± 0.012 (syst) ps and Ar s = 0.075 ± 
0.035 (stat) ± 0.006 (syst) ps -1 are the most precise mea- 
surements of these quantities using a single decay mode. 
They are also in good agreement with the PDG world 
averages @. The measurements of the transversity am- 
plitudes are consistent with previous measurements in 
the B Q S — ¥ J/ip4> system. Finally, we report an alternative 
Bayesian analysis based on Markov chain integration that 
gives results consistent with the frequentist approach. 
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